Is there a smart way to mass UPDATE in MySQL? - mysql

I have a table that needs regular updating. These updates happen in batches. Unlike with INSERT, I can't just include multiple rows in a single query. What I do now is prepare the UPDATE statement, then loop through all the possibilities and execute each. Sure, the preparation happens only once, but there are still a lot of executions.
I created several versions of the table of different sizes (thinking that maybe better indexing or splitting the table would help). However, that did not have an effect on update times. 100 updates take about 4 seconds for either 1,000-row table or 500,000-row one.
Is there a smarter way of doing this faster?
As asked in the comments, here is actual code (PHP) I have been testing with. Column 'id' is a primary key.
$stmt = $dblink->prepare("UPDATE my_table SET col1 = ? , col2 = ? WHERE id = ?");
$rc = $stmt->bind_param("ssi", $c1, $c2, $id);
foreach ($items as $item) {
$c1 = $item['c1'];
$c2 = $item['c2'];
$id = $item['id'];
$rc = $stmt->execute();
}
$stmt->close();

If you really want to do it all in one big statement, a kludgy way would be to use the "on duplicate key" functionality of the insert statement, even though all the rows should already exist, and the duplicate key update will hit for every single row.
INSERT INTO table (a,b,c) VALUES (1,2,3),(4,5,6)
ON DUPLICATE KEY UPDATE 1=VALUES(a), b=VALUES(b), c=VALUES(c);

Try LOAD DATA INFILE. Much faster than MySQL INSERT's or UPDATES, as long as you can get the data in a flat format.

Related

How to UPDATE many rows(1 500 000) fast

I have table with 1.5 mil rows and I have 47k values to update.
I've tried two ways of doing it and both are pretty slow.
First is 47k rows of
UPDATE $table
SET name = '$name'
WHERE id = '$id'
Second is
$prefix = "UPDATE table
SET name = (case ";
while () {
$mid .= "when id = '$id' then '$name' ";
}
$suffix = "end);";
$query = $prefix . $mid . $suffix;
Is there a way of doing it faster? Maybe with LOAD DATA INFILE ? Can't figure out the UPDATE syntax with this one.
I had to import large files on a daily basis, and tried all sorts of things.
In the end I got the best performance a specific combination of:
First copy the CSV to the database server, and load it from the local disk there, instead of loading the CSV from your client machine.
Make sure that you have a table structure that matches exactly with this. I've used a temporary table for the import, and then used separate queries on that to get data into the final table.
No foreign keys and unique index checks on the tmp table.
That will speeds things up a lot already. If you need to squeeze more performance in, you can increase the log buffer size.
And obviously:
make sure that you don't import stuff that you don't need to. Be critical about which fields you include, and which rows.
If you only have a few different values of text in a column, use a numeric value for it instead.
Do you really need 8 decimals in your floats?
Are you repeatedly importing the same data, where you could insert updates only?
Make sure that you don't trigger unnecessary type conversions during import. Prepare your data to be as close to the table that you're importing into.

MySQL and queries performance

Question regarding best practices for queries and perl.
Is there a performance or 'bad practice' of having a select inside a perl for loop? Is it an issue to send so many selects in rapid fire to the DB?
code is quasi pseudo code
#line has 5000 lines
foreach my $elem( #line ){
SQL = SELECT IGNORE INTO <table> ( column1, .. , column 10 ) VALUES ( 'a', .. , 'j' )
}
What about deletes and/or updates?
foreach my $elem( #line ) {
my $UN = substr($elem, 0, 10 );
SQL = UPDATE <table> SET <column> = $UN;
}
foreach my $elem( #line ) {
my $UN = substr($elem, 0, 10 );
SQL = DELETE from <table> WHERE <column> = $UN
}
Also, I have a question in the same arena, I have 5000 items I am checking and my Database has anywhere from 1 element to 5000 elements at any given time. Is it acceptable to loop through my 5000 items in perl and delete the ID in the Database or should there be a check at first to see if the ID exists before issues the delete command.
foreach my $elem ( #line ){
$ID = substr( $elem, 5, 0 );
SQL = DELETE FROM <table> WHERE id = $ID;
}
or should it be something like:
foreach my $elem ( #line ){
$ID = substr( $elem, 5, 0 );
SQL = DELETE FROM <table> WHERE id = $ID if ID exists;
}
Thanks,
--Eherr
As for inserts in rapid succession, not a problem. The server is tailored to handle that.
Caution should be taken with insert ignore for other reasons though, such as program logic that should address failure that otherwise would not be able to address a failure you just ignored.
As for your particular update you showed, that does not make a ton of sense in a loop (or perhaps at all) because you are not specifying a where clause. Meaning, why loop, say, 1000 times, each doing an update to all the rows due to no where clause? Maybe that was just a typo of yours.
As for deletes, there is no problem with that, either, in a loop, in general. If you are looking to empty a table, look into truncate table, faster, and not logged if that is ever a desire. Note though that truncate is disallowed on tables that are the referenced table in foreign key constraint situations. In those situations there are the concepts of the referencing table and the referenced.
Other general comments: care should be taken to ensure that any referential integrity in place or that should be in place is honored. Doing insert ignore, update, or delete can fail due to foreign key constraints. Also, checking for the existence of a row that you are about to delete anyway may be overkill idk. It is marching down a btree anyway to find it. Why do it twice (the marching part). Marching might not be a good word, perhaps flying. But on a tablescan, it would be added pain.
Lastly, when you are in a situation of massive bulk insert, loops are never up to the task in any programming language as compared to LOAD DATA INFILE performance. Several of your peers have seen 8 to 10 hour operations reduced to 2 minutes by using LOAD DATA (references to links available if you ask). Ok This Link is one.
Mysql Manual Page below:
Referential Integrity with Foreign Key Constraints
Quickly clearing tables with Truncate Table
Bulk inserts with LOAD DATA INFILE.
As per my opinion,it is a bit slow to make multiple queries.Better construct a single update,insert,select and delete query and fire.
There are few tips before using multiple quesris or single query.
1) If Database is configured to kill all queries that takes more than spcecified time, then using a single query if it is too large, can lead to killing of query.
2) Also, if user is waiting for response, then it can be done using pagination,i.e., fetch few records now...and subsequent later but not one by one.
5000 queries into a database shouldn't be a performance bottleneck. You're fine. You can always benchmark a read-only run.

I need to find out who last updated a mysql database

I can get a last update time from TABLES in information_schema. Can I get a USER who updated the database or a table?
As Amadan mentioned, I'm pretty sure there isn't a way to do this unless you record it yourself. However, this is a pretty straightforward thing to do: Whenever you perform an UPDATE query, also log in a separate table the user (as well as any other relevant information) that you want to record via an additional MySQL query. Something like this (written in PHP as you didn't specify a language, but the MySQL can be exported anywhere) will work:
// The update query
$stmt = $db->prepare("UPDATE table SET `col` = ? WHERE `col` = ?");
$stmt->execute(array($var1, $var2));
// Something in table has just been updated; record user's id and time of update
$stmt = $db->prepare("INSERT INTO log (userid, `time`) VALUES (?, NOW())");
$stmt->execute(array($userid));

INSERT Batch, and if duplicate key Update in codeigniter

Is there any way of performing in batch Insert query and if the key already exists, UPDATE that row in codeigniter?
I have gone through the documentation and found only insert_batch and update_batch. But how to update the row with duplicate key in active records? And what happens if one row fails to be inserted or updated in batch_insert? All insertion fails or only that row?
You will have to go with little custom query by adding "ON DUPLICATE" statement
$sql = $this->db->insert_string('YourTable', $data) . ' ON DUPLICATE KEY UPDATE duplicate=duplicate+1';
$this->db->query($sql);
$id = $this->db->insert_id();
Also please check this out, it will give you better solution
For Codeigniter 3, you can use this library. This allows you to be able to supply an array of key-value pairs to be inserted into separate rows and will auto-update data if a duplicate key occurs in DB.
https://github.com/amir-ranglerz/CodeIgniter-Insert-Batch-on-DUPLICATE-KEY-Update

How to check if the value in the mySQL DB exist?

I have a mySQL database and I have a Perl script that connects to it and performs some data manipulation.
One of the tables in the DB looks like this:
CREATE TABLE `mydb`.`companies` (
`company_id` INT NOT NULL AUTO_INCREMENT,
`company_name` VARCHAR(100) NULL ,
PRIMARY KEY (`company_id`) );
I want to insert some data in this table. The problem is that some companies in the data can be repeated.
The question is: How do I check that the "company_name" already exist? If it exist I need to retrieve "company_id" and use it to insert the data into another table. If it does not, then this info should be entered in this table, but I already have this code.
Here is some additional requirement: The script can be run multiple times simultaneously, so I can't just read the data into the hash and check if it already exist.
I can throw an additional "SELECT" query, but it will create an additional hit on the DB.
I tried to look for an answer, but every question here or the thread on the web talks about using the primary key checking. I don't need this. The DB structure is already set but I can make changes if need to be. This table will be used as an additional table.
Is there another way? In both DB and Perl.
"The script can be run multiple times simultaneously, so I can't just read the data into the hash and check if it already exist."
It sounds like your biggest concern is that one instance of the script may insert a new company name while another script is running. The two scripts may check the DB for the existence of that company name when it doesn't exist, and then they might both insert the data, resulting in a duplicate.
Assuming I'm understanding your problem correctly, you need to look at transactions. You need to be able to check for the data and insert the data before anyone else is allowed to check for that data. That will keep a second instance of the script from checking for data until the 1st instance is done checking AND inserting.
Check out: http://dev.mysql.com/doc/refman/5.1/en/innodb-transaction-model.html
And: http://dev.mysql.com/doc/refman/5.1/en/commit.html
MyISAM doesn't support transactions. InnoDB does. So you need to make sure your table is InnoDB. Start your set of queries with START TRANSACTION.
Alternatively, you could do this, if you have a unique index on company_name (which you should).
$query_string = "INSERT INTO `companies` (NULL,'$company_name')";
This will result in an error if the company_name already exists. Try a sample run attempting to insert a duplicate company name. In PHP,
$result = mysql_query($query_string);
$result will equal false on error. So,
if(!$result) {
$query2 = "INSERT INTO `other_table` (NULL,`$company_name`)";
$result2 = mysql_query($query2);
}
If you have a unique key on company_name in both tables, then MySQL will not allow you to insert duplicates. Your multiple scripts may spend a lot of time trying to insert duplicates, but they will not succeed.
EDIT: continuing from the above code, and doing your work for you, here is what you would do if the insert was successful.
if(!$result) {
$query2 = "INSERT INTO `other_table` (NULL,`$company_name`)";
$result2 = mysql_query($query2);
} else if($result !== false) { // must use '!==' rather than '!=' because of loose PHP typing
$last_id = mysql_insert_id();
$query2 = "UPDATE `other_table` SET `some_column` = 'some_value' WHERE `id` = '$last_id'";
// OR, maybe you want this query
// $query2a = "INSERT INTO `other_table` (`id`,`foreign_key_id`) VALUES (NULL,'$last_id');
}
I suggest you write a stored procedure(STP), which takes input as company name.
In this STP, first check existing company name. If it exists, return id. Otherwise, insert and return id.
This way, you hit DB only once
For InnoDB use transaction. For MyISAM lock table, do modifications, unlock.