how do i speed up mysql updates other than looping - mysql

I have to read 460,000 records from one database and update those records in another database. Currently, I read all of the records in (select * from...) and then loop through them sending an update command to the second database for each record. This process is slower than I hoped and I was wondering if there is a faster way. I match up the records by the one column that is indexed (primary key) in the table.
Thanks.

I would probably optimize the fetch size for reads (e.g. setFetchSize(250)) and JDBC - Batch Processing for writes (e.g. a batch size of 250 records).

I am assuming your "other database" is on a separate server, so can't just be directly joined.
The key is to have fewer update statements. It can often be faster to insert your data into a new table like this:
create table updatevalues ( id int(11), a int(11), b int(11), c int(11) );
insert into updatevalues (id,a,b,c) values (1,1,2,3),(2,4,5,6),(3,7,8,9),...
update updatevalues u inner join targettable t using (id) set t.a=u.a,t.b=u.b,t.c=u.c;
drop table updatevalues;
(batching the inserts into however many statements you can fit in however big your maximum size is configured at, usually in the megabytes).
Alternatively, find unique values and update them together:
update targettable set a=42 where id in (1,3,7);
update targettable set a=97 where id in (2,5);
...
update targettable set b=1 where id in (1,7);
...

1. USE MULTI QUERY
aha. 'another db' means remote database.. in this case you SHOULD reduce number of interaction with remote DB. I suggest that use MULTIPLE QUERY. e.g to execute 1,000 UPDATE at once,
$cnt = 1;
for ($row in $rows)
{
$multi_query .= "UPDATE ..;";
if ($cnt % 1000 == 0)
{
mysql_query($multi_query);
$cnt = 0;
$multi_query = "";
}
++$cnt;
}
Normally Multi query feature is disable (for security reason), To use Multi query
PHP : http://www.php.net/manual/en/mysqli.quickstart.multiple-statement.php
C API : http://dev.mysql.com/doc/refman/5.0/en/c-api-multiple-queries.html
VB : http://www.devart.com/dotconnect/mysql/docs/MultiQuery.html (I'm not a VB user, so not sure this is for MULTI Query for VB)
2. USE Prepared Statement
(When you are already using prepared stmt. skip this)
You are running 460K same structured Queries. So If you use PREPARED STATEMENT, you can obtain two advantages.
Reduce query compile time
without prepared stmt. All queries are compiled, but just one time with prepared stmt.
Reduce Network Cost
Assuming each UPDATE query is 100 bytes long, and there are 4 parameters (each is 4 bytes long)
without prepare stmt : 100 bytes * 460K = 46M
with prepare stmt : 16 bytes * 460K = 7.3M
it doesn't reduce dramatically
Here is how to use prepared statement in VB.

What I ended up doing was using a loop to concatenate my queries together. So instead of sending one query at a time, I would send a group at a time separated by semicolons:
update sometable set x=1 where y =2; update sometable set x = 5 where y = 6; etc...
This ended up improving my time by about 40%. My update went from 3 min 23 secs to 2 min 1 second.
But there was a threshold, where concatenating too many together started to slow it down again when the string got too long. So I had to tweak it until I found just the right mix. It ended up being 100 strings concatenated together that gave the best performance.
Thanks for the responses.

Related

how to reduce (or eliminate) mysql "table does not exist" errors when renaming a table to a new version of itself [duplicate]

This question already has an answer here:
How to rename two tables in one atomic operation in MySQL
(1 answer)
Closed 2 years ago.
Getting some "myTable does not exist" errors while renaming a table to an updated version of itself. Not sure if I'm doing it right.
I have a web site where users run queries against a table that is replaced once every 5 minutes with an updated copy of itself. The table has 600,000 rows and needs to be built from scratch once every few minutes so that it is internally consistent.
This is how I do the table update:
// not shown: bunch of code to build newTable from scratch; takes 90 seconds
// while this is happening users are querying myTable
// Then, on a 5 minute mark, this happens:
START TRANSACTION
RENAME TABLE myTable TO oldTable // fast, like 0.005 seconds
RENAME TABLE newTable TO myTable // fast, like 0.005 seconds
COMMIT
DROP oldTable // a bit slow... like 0.5 to 1.0 seconds
I put the DROP outside the transaction because I'm trying to minimize the time when myTable doesn't exist.
During this transition period (which happens every 5 min) I'm getting a 1 to 3 mysql errors "myTable does not exist".
I'm not sure if some users are just starting a query exactly during the time when myTable has been renamed (and therefore does not exist) before newTable has been renamed to myTable? It's a pretty tiny window; I think the transaction takes 0.01 seconds and there are maybe 20-30 users on the site at one time (according to Google Analytics) running queries.
Or maybe there are some longer queries in progress just before I rename myTable to oldTable? Does a query from another thread fail if you take its table away in another thread?
Should I even be using START TRANSACTION / COMMIT for this use case?
All tables are InnoDB. Mysql version is "Ver 8.0.22-0ubuntu0.20.04.3 for Linux on x86_64 ((Ubuntu))"
Any suggestions on how I can get rid of the "myTable does not exist" errors while I'm in the middle of renaming the tables once every 5 minutes?
Im not sure if this will work for your use case, but one way you could achieve the same result without rebuilding the table every few minutes would be like this:
Only use one table, but add a column to it that defines the version number or something similar.
In your 'rebuild' step, add rows to the table with a new version id.
While this is happening query the table for the previous version id
At the 5 minute mark, start querying for the new version id.
Remove all rows from the table that have the old version id
Updated:
I think using two tables with same structure between which you can switch would solve you problem. Let's say we have TableA and TableB, and currently your application is reading data from tableA. Once you receive a new snapshot and process the data, try inserting it in the non-used table(TableB).
Once data is inserted, you may switch your reference from tableA to tableB.
You may perform a select with limit 1, to get the timestamp from both tables, to check which onc contains stale data and should be overridden. And same logic can be used to reference the correct table.
Similar to having two partitions and switching b/w the two OS
Same logic can be abstracted in a Stored Procedure, which you can call to insert or fetch data. And the Stored Procedure would decide which table to refer. This would reduce multiple DB calls from the application.
To start with,
START TRANSACTION won't help as RENAME is a DDL command
And the syntax is incorrect (oldTableName then newTableName):
RENAME TABLE oldTable TO newTable;
You can use a store procedure, which first checks if table exists before renaming, to avoid errors and silently fail the queries.
DELIMITER $$
CREATE PROCEDURE SP_RENAME_TABLE(IN newTable VARCHAR(64), IN oldTable VARCHAR(64))
BEGIN
IF EXISTS(
SELECT 1 FROM information_schema.tables
WHERE
table_schema = DATABASE() AND
table_name = oldTable LIMIT 1)
THEN
SET #query = CONCAT('RENAME TABLE ', oldTable, ' TO ', newTable, ';');
PREPARE stmt FROM #query;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END IF;
END$$
DELIMITER ;
CALL SP_RENAME_TABLE('new_table_name', 'old_table_name');

Update MySQL table in chunks

I am trying to update a MySQL InnoDB table with c. 100 million rows. The query takes close to an hour, which is not a problem.
However, I'd like to split this update into smaller chunks in order not to block table access. This update does not have to be an isolated transaction.
At the same time, the splitting of the update should not be too expensive in terms of additional overhead.
I considered looping through the table in a procedure using :
UPDATE TABLENAME SET NEWVAR=<expression> LIMIT batchsize, offset,
But UPDATE does not have an offset option in MySQL.
I understand I could try to UPDATE ranges of data that are SELECTed on a key, together with the LIMIT option, but that seems rather complicated for that simple task.
I ended up with the procedure listed below. It works but I am not sure whether it is efficient with all the queries to identify consecutive ranges. It can be called with the following arguments (example):
call chunkUpdate('SET var=0','someTable','theKey',500000);
Basically, the first argument is the update command (e.g. something like "set x = ..."), followed by the mysql table name, followed by a numeric (integer) key that has to be unique, followed by the size of the chunks to be processed. The key should have an index for reasonable performance. The "n" variable and the "select" statements in the code below can be removed and are only for debugging.
delimiter //
CREATE PROCEDURE chunkUpdate (IN cmd VARCHAR(255), IN tab VARCHAR(255), IN ky VARCHAR(255),IN sz INT)
BEGIN
SET #sqlgetmin = CONCAT("SELECT MIN(",ky,")-1 INTO #minkey FROM ",tab);
SET #sqlgetmax = CONCAT("SELECT MAX(",ky,") INTO #maxkey FROM ( SELECT ",ky," FROM ",tab," WHERE ",ky,">#minkey ORDER BY ",ky," LIMIT ",sz,") AS TMP");
SET #sqlstatement = CONCAT("UPDATE ",tab," ",cmd," WHERE ",ky,">#minkey AND ",ky,"<=#maxkey");
SET #n=1;
PREPARE getmin from #sqlgetmin;
PREPARE getmax from #sqlgetmax;
PREPARE statement from #sqlstatement;
EXECUTE getmin;
REPEAT
EXECUTE getmax;
SELECT cmd,#n AS step, #minkey AS min, #maxkey AS max;
EXECUTE statement;
set #minkey=#maxkey;
set #n=#n+1;
UNTIL #maxkey IS NULL
END REPEAT;
select CONCAT(cmd, " EXECUTED IN ",#n," STEPS") AS MESSAGE;
END//

mysql stored procedure is slower 20 times than standard query

i have 10 tables with same structure except table name.
i have a sp (stored procedure) defined as following:
select * from table1 where (#param1 IS NULL OR col1=#param1)
UNION ALL
select * from table2 where (#param1 IS NULL OR col1=#param1)
UNION ALL
...
...
UNION ALL
select * from table10 where (#param1 IS NULL OR col1=#param1)
I am calling the sp with the following line:
call mySP('test') //it executes in 6,836s
Then I opened a new standard query window. I just copied the query above. Then replaced #param1 with 'test'.
This executed in 0,321s and is about 20 times faster than the stored procedure.
I changed the parameter value repeatedly for preventing the result to be cached. But this did not change the result. The SP is about 20 times slower than the equivalent standard query.
Please can you help me to figure out why this is happening ?
Did anybody encounter similar issues?
I am using mySQL 5.0.51 on windows server 2008 R2 64 bit.
edit: I am using Navicat for test.
Any idea will be helpful for me.
EDIT1:
I just have done some test according to Barmar's answer.
At finally i have changed the sp like below with one just one row:
SELECT * FROM table1 WHERE col1=#param1 AND col2=#param2
Then firstly i executed the standart query
SELECT * FROM table1 WHERE col1='test' AND col2='test' //Executed in 0.020s
After i called the my sp:
CALL MySp('test','test') //Executed in 0.466s
So i have changed where clause entirely but nothing changed. And i called the sp from mysql command window instead of navicat. It gave same result. I am still stuck on it.
my sp ddl:
CREATE DEFINER = `myDbName`#`%`
PROCEDURE `MySP` (param1 VARCHAR(100), param2 VARCHAR(100))
BEGIN
SELECT * FROM table1 WHERE col1=param1 AND col2=param2
END
And col1 and col2 is combined indexed.
You could say that why dont you use standart query then? My software design is not proper for this. I must use stored procedure. So this problem is highly important to me.
EDIT2:
I have gotten query profile informations. Big difference is because of "sending data row" in SP Profile Information. Sending data part takes %99 of query execution time. I am doing test on local database server. I am not connecting from remote computer.
SP Profile Informations
Query Profile Informations
I have tried force index statement like below in my sp. But same result.
SELECT * FROM table1 FORCE INDEX (col1_col2_combined_index) WHERE col1=#param1 AND col2=#param2
I have changed sp like below.
EXPLAIN SELECT * FROM table1 FORCE INDEX (col1_col2_combined_index) WHERE col1=param1 AND col2=param2
This gave this result:
id:1
select_type=SIMPLE
table:table1
type=ref
possible_keys:NULL
key:NULL
key_len:NULL
ref:NULL
rows:292004
Extra:Using where
Then i have executed the query below.
EXPLAIN SELECT * FROM table1 WHERE col1='test' AND col2='test'
Result is:
id:1
select_type=SIMPLE
table:table1
type=ref
possible_keys:col1_co2_combined_index
key:col1_co2_combined_index
key_len:76
ref:const,const
rows:292004
Extra:Using where
I am using FORCE INDEX statement in SP. But it insists on not using index. Any idea? I think i am close to end :)
Just a guess:
When you run the query by hand, the expression WHERE ('test' IS NULL or COL1 = 'test') can be optimized when the query is being parsed. The parser can see that the string 'test' is not null, so it converts the test to WHERE COL1 = 'test'. And if there's an index on COL1 this will be used.
However, when you create a stored procedure, parsing occurs when the procedure is created. At that time, it doesn't know what #param will be, and has to implement the query as a sequential scan of the table.
Try changing your procedure to:
IF #param IS NULL
THEN BEGIN
SELECT * FROM table1
UNION ALL
SELECT * FROM table2
...
END;
ELSE BEGIN
SELECT * FROM table1 WHERE col1 = #param
UNION ALL
SELECT * FROM table2 WHERE col1 = #param
...
END;
END IF;
I don't have much experience with MySQL stored procedures, so I'm not sure that's all the right syntax.
Possible character set issue? If your table character set is different from your database character set, this may be causing a problem.
See this bug report: http://bugs.mysql.com/bug.php?id=26224
[12 Nov 2007 21:32] Mark Kubacki Still no luck with 5.1.22_rc - keys
are ingored, query takes within a procedure 36 seconds and outside
0.12s.
[12 Nov 2007 22:30] Mark Kubacki After having changed charsets to UTF-8 (especially for the two used), which is used for the
connection anyways, keys are taken into account within the stored
procedure!
The question I cannot answer is: Why does the optimizer treat charset
conversions an other way within and outside stored procedures?
(Indeed, I might be wrong asking this.)
Interesting question, because I am fond of using stored procedures. Reason is maintenance and the encapsulation principle.
This is information I found:
http://dev.mysql.com/doc/refman/5.1/en/query-cache-operation.html
It states that the query cache is not used for queries that
1. are a subquery that belong to an outer query, and
2. are executed within the body of a stored procedure, trigger or event.
This implies that it works as designed.
I had seen this behavior, but it wasn't related to the character set.
I had a table that held self-referencing hierarchical data (a parent with children, and some children had children of their own, etc.). Since the parent_id had to reference the primary id's (and the column specified a constraint to that effect), I couldn't set the parent id to NULL or 0 (zero) to disassociate a child from a parent, so I simply referenced it to itself.
When I went to run a stored procedure to perform the recursive query to find all children (at all levels) of a particular parent, the query took between 30 & 40 times as long to run. I found that altering the query used by the stored procedure to make sure it excluded the top-level parent record (by specifying WHERE parent_id != id) restored the performance of the query.
The stored procedure I'm using is based on the one shown in:
https://stackoverflow.com/questions/27013093/recursive-query-emulation-in-mysql.

MySQL - Executing intensive queries on live server

I'm having some issues dealing with updating and inserting millions of row in a MySQL Database. I need to flag 50 million rows in Table A, insert some data from the marked 50 million rows into Table B, then update those same 50 million rows in Table A again. There are about 130 million rows in Table A and 80 million in Table B.
This needs to happen on a live server without denying access to other queries from the website. The problem is while this stored procedure is running, other queries from the website end up locked and the HTTP request times out.
Here's gist of the SP, a little simplified for illustration purposes:
CREATE DEFINER=`user`#`localhost` PROCEDURE `MyProcedure`(
totalLimit int
)
BEGIN
SET #totalLimit = totalLimit;
/* Prepare new rows to be issued */
PREPARE STMT FROM 'UPDATE tableA SET `status` = "Being-Issued" WHERE `status` = "Available" LIMIT ?';
EXECUTE STMT USING #totalLimit;
/* Insert new rows for usage into tableB */
INSERT INTO tableB (/* my fields */)
SELECT /* some values from TableA */
FROM tableA
WHERE `status` = "Being-Issued";
/* Set rows as being issued */
UPDATE tableB SET `status` = 'Issued' WHERE `status` = 'Being-Issued';
END$$
DELIMITER ;
Processing 50M rows three times will be slow irrespective of what you're doing.
Make sure your updates are affecting smaller-sized, disjoint sets. And execute each of them one by one, rather than each and every one of them within the same transaction.
If you're doing this already and MySQL is misbehaving, try this slight tweak to your code:
create a temporary table
begin
insert into tmp_table
select your stuff
limit ?
for update
do your update on A using tmp_table
commit
begin
do your insert on B using tmp_table
do your update on A using tmp_table
commit
this should keep locks for a minimal time.
What about this? It basically calls the original stored procedure in a loop until the total amount needed is reached, and having a sleep period in between calls (like 2 seconds) to allow other queries to process.
increment is the amount to do at one time (using 10,000 in this case)
totalLimit is the total amount to be processed
sleepSec is the amount of time to rest between calls
BEGIN
SET #x = 0;
REPEAT
SELECT SLEEP(sleepSec);
SET #x = #x + increment;
CALL OriginalProcedure( increment );
UNTIL #x >= totalLimit
END REPEAT;
END$$
Obviously it could use a little math to make sure the increment doesn't go over the total limit if its not evenly divisible, but it appears to work (by work I mean allow other queries to still be processed from web requests), and seems to be faster overall as well.
Any insight here? Is this a good idea? Bad idea?

MySQL UPDATE and SELECT in one pass

I have a MySQL table of tasks to perform, each row having parameters for a single task.
There are many worker apps (possibly on different machines), performing tasks in a loop.
The apps access the database using MySQL's native C APIs.
In order to own a task, an app does something like that:
Generate a globally-unique id (for simplicity, let's say it is a number)
UPDATE tasks
SET guid = %d
WHERE guid = 0 LIMIT 1
SELECT params
FROM tasks
WHERE guid = %d
If the last query returns a row, we own it and have the parameters to run
Is there a way to achieve the same effect (i.e. 'own' a row and get its parameters) in a single call to the server?
try like this
UPDATE `lastid` SET `idnum` = (SELECT `id` FROM `history` ORDER BY `id` DESC LIMIT 1);
above code worked for me
You may create a procedure that does it:
CREATE PROCEDURE prc_get_task (in_guid BINARY(16), OUT out_params VARCHAR(200))
BEGIN
DECLARE task_id INT;
SELECT id, out_params
INTO task_id, out_params
FROM tasks
WHERE guid = 0
LIMIT 1
FOR UPDATE;
UPDATE task
SET guid = in_guid
WHERE id = task_id;
END;
BEGIN TRANSACTION;
CALL prc_get_task(#guid, #params);
COMMIT;
If you are looking for a single query then it can't happen. The UPDATE function specifically returns just the number of items that were updated. Similarly, the SELECT function doesn't alter a table, only return values.
Using a procedure will indeed turn it into a single function and it can be handy if locking is a concern for you. If your biggest concern is network traffic (ie: passing too many queries) then use the procedure. If you concern is server overload (ie: the DB is working too hard) then the extra overhead of a procedure could make things worse.
I have the exact same issue. We ended up using PostreSQL instead, and UPDATE ... RETURNING:
The optional RETURNING clause causes UPDATE to compute and return value(s) based on each row actually updated. Any expression using the table's columns, and/or columns of other tables mentioned in FROM, can be computed. The new (post-update) values of the table's columns are used. The syntax of the RETURNING list is identical to that of the output list of SELECT.
Example: UPDATE 'my_table' SET 'status' = 1 WHERE 'status' = 0 LIMIT 1 RETURNING *;
Or, in your case: UPDATE 'tasks' SET 'guid' = %d WHERE 'guid' = 0 LIMIT 1 RETURNING 'params';
Sorry, I know this doesn't answer the question with MySQL, and it might not be easy to just switch to PostgreSQL, but it's the best way we've found to do it. Even 6 years later, MySQL still doesn't support UPDATE ... RETURNING. It might be added at some point in the future, but for now MariaDB only has it for DELETE statements.
Edit: There is a task (low priority) to add UPDATE ... RETURNING support to MariaDB.
I don't know about the single call part, but what you're describing is a lock. Locks are an essential element of relational databases.
I don't know the specifics of locking a row, reading it, and then updating it in MySQL, but with a bit of reading of the mysql lock documentation you could do all kinds of lock-based manipulations.
The postgres documenation of locks has a great example describing exactly what you want to do: lock the table, read the table, modify the table.
UPDATE tasks
SET guid = %d, params = #params := params
WHERE guid = 0 LIMIT 1;
It will return 1 or 0, depending on whether the values were effectively changed.
SELECT #params AS params;
This one just selects the variable from the connection.
From: here