Performance of batched statements using INSERT ... SET vs INSERT ... VALUES

Performance of batched statements using INSERT ... SET vs INSERT ... VALUES - mysql

I recently wrote a simple Java program that processed some data and inserted it in a MyISAM table. About 35000 rows had to be inserted. I wrote the INSERT statement using INSERT ... SET syntax and executed it for all rows with PreparedStatement.executeBatch(). So:
String sql = "INSERT INTO my_table"
+ " SET "
+ " my_column_1 = ? "
+ " my_column_2 = ? "
...
+ " my_column_n = ? ";
try(PreparedStatement pst = con.prepareStatement(sql)){
for(Object o : someCollection){
pst.setInt(1, ...);
pst.setInt(2, ...);
...
pst.setInt(n, ...);
pst.addBatch();
}
pst.executeBatch();
}
I tried inserting all rows in a single batch and in bacthes of 1000, but in all cases the execution was VERY slow (about 1 minute per 1000 rows). After some tinkering I found that changing the syntax to INSERT ... VALUES improved the speed dramatically, 100x at the very least (I didn't measure it accurately).
String sql = "INSERT INTO my_table (my_column_1, my_column_2, ... , my_column_n)"
+ " VALUES (?, ?, ... , ?)";
What's going on here? Can it be that the JDBC driver cannot rewrite the batches when using INSERT ... SET? I didn't find any documentation about this. I am creating my connections with options rewriteBatchedStatements=true&useServerPrepStmts=false.
I first noticed this problem when accessing a database in another host. That is, I have used the INSERT ... SET approach before without any noticeable performance issue in applications that were executing in the same host as the database. So I guess the problem may be that many more statements are sent over the network with INSERT ... SET than with INSERT ... VALUES.

If you examine the INSERT ... SET syntax, you'll see it's only meant for inserting a single row. INSERT ... VALUES is meant for inserting multiple rows at one time.
In other words - even though you set rewriteBatchedStatements=true, the JDBC driver can't optimize the SET variation like it can with the VALUES variation because SET is not built for the batch case you have. Use VALUES to compress N inserts into one.
Bonus tip - If you use ON DUPLICATE KEY UPDATE, the JDBC currently can't rewrite those statements either. (edit: This statement is false - my mistake.)
There's an option you can set to verify all of this for yourself (I think it's 'profileSQL').

Related

EF6 & MySql - Avoid selecting Id after inserting record

I have a program that inserts thousands of records in a MySql DB. The operation cannot be done in bulk for a variety of reasons. Overall, the operations are very slow.
After looking at the SQL that is being generated, I can see that EF is calling a select to get the Id of the recently inserted record.
SET SESSION sql_mode = 'ANSI'; INSERT INTO `table`(
'blah') VALUES(
86613784);
SELECT
`id`
FROM `table`
WHERE row_count() > 0 AND `id`= last_insert_id()
Since I don't need that Id, how can I tell EF to avoid the call and save me the time?
FYI - I am already using the following statements to speed things up as well.
Configuration.ValidateOnSaveEnabled = false;
Configuration.AutoDetectChangesEnabled = false;
As requested, here is the code used to create the record. Not much to it, but if it helps...
using (var ctx = new Tc_TrademarkEntities(_entityFrameworkConnectionString))
{
ctx.case_file.Add(request.Trademark);
ctx.SaveChanges();
}

How to run 'SELECT FOR UPDATE' in Laravel 3 / MySQL

I am trying to execute SELECT ... FOR UPDATE query using Laravel 3:
SELECT * from projects where id = 1 FOR UPDATE;
UPDATE projects SET money = money + 10 where id = 1;
I have tried several things for several hours now:
DB::connection()->pdo->exec($query);
and
DB::query($query)
I have also tried adding START TRANSACTION; ... COMMIT; to the query
and I tried to separate the SELECT from the UPDATE in two different parts like this:
DB::query($select);
DB::query($update);
Sometimes I get 0 rows affected, sometimes I get an error like this one:
SQLSTATE[HY000]: General error: 2014 Cannot execute queries while other unbuffered queries are active. Consider using PDOStatement::fetchAll(). Alternatively, if your code is only ever going to run against mysql, you may enable query buffering by setting the PDO::MYSQL_ATTR_USE_BUFFERED_QUERY attribute.
SQL: UPDATE `sessions` SET `last_activity` = ?, `data` = ? WHERE `id` = ?
I want to lock the row in order to update sensitive data, using Laravel's database connection.
Thanks.

In case all you need to do is increase money by 10, you don't need to lock the row before update. Simply executing the update query will do the job. The SELECT query will only slow down your script and doesn't help in this case.
UPDATE projects SET money = money + 10 where id = 1;

I would use diferent queries for sure, so you can have control on what you are doing.
I would use a transaction.
If we read this simple explanations, pdo transactions are quite straightforward. They give us this simple but complete example, that ilustrates how everithing is as we should expect (consider $db to be your DB::connection()->pdo).
try {
$db->beginTransaction();
$db->exec("SOME QUERY");
$stmt = $db->prepare("SOME OTHER QUERY?");
$stmt->execute(array($value));
$stmt = $db->prepare("YET ANOTHER QUERY??");
$stmt->execute(array($value2, $value3));
$db->commit();
}
catch(PDOException $ex) {
//Something went wrong rollback!
$db->rollBack();
echo $ex->getMessage();
}
Lets go to your real statements. For the first of them, the SELECT ..., i wouldn't use exec, but query, since as stated here
PDO::exec() does not return results from a SELECT statement. For a
SELECT statement that you only need to issue once during your program,
consider issuing PDO::query(). For a statement that you need to issue
multiple times, prepare a PDOStatement object with PDO::prepare() and
issue the statement with PDOStatement::execute().
And assign its result to some temp variable like
$result= $db->query ($select);
After this execution, i would call $result->fetchAll(), or $result->closeCursor(), since as we can read here
If you do not fetch all of the data in a result set before issuing
your next call to PDO::query(), your call may fail. Call
PDOStatement::closeCursor() to release the database resources
associated with the PDOStatement object before issuing your next call
to PDO::query().
Then you can exec the update
$result= $db->exec($update);
And after all, just in case, i would call again $result->fetchAll(), or $result->closeCursor().

If the aim is
to lock the row in order to update sensitive data, using Laravel's database connection.
Maybe you can use PDO transactions :
DB::connection()->pdo->beginTransaction();
DB::connection()->pdo->commit();
DB::connection()->pdo->rollBack();

Perl DBI insert multiple rows using mysql native multiple insert ability

Has anyone seen a DBI-type module for Perl which capitalizes, easily, on MySQL's multi-insert syntax
insert into TBL (col1, col2, col3) values (1,2,3),(4,5,6),...?
I've not yet found an interface which allows me to do that. The only thing I HAVE found is looping through my array. This method seems a lot less optimal vs throwing everything into a single line and letting MySQL handle it. I've not found any documentation out there IE google which sheds light on this short of rolling my own code to do it.
TIA

There are two approaches. You can insert (?, ?, ?) a number of times based on the size of the array. The text manipulation would be something like:
my $sql_values = join( ' ', ('(?, ?, ?)') x scalar(#array) );
Then flatten the array for calling execute(). I would avoid this way because of the thorny string and array manipulation that needs to be done.
The other way is to begin a transaction, then run a single insert statement multiple times.
my $sql = 'INSERT INTO tbl (col1, col2, col3)';
$dbh->{AutoCommit} = 0;
my $sth = $dbh->prepare_cached( $sql );
$sth->execute( #$_ ) for #array;
$sth->finish;
$dbh->{AutoCommit} = 1;
This is a bit slower than the first method, but it still avoids reparsing the statement. It also avoids the subtle manipulations of the first solution, while still being atomic and allowing disk I/O to be optimized.

If DBD::mysql supported DBI's execute_for_fetch (see DBI's execute_array and execute_for_fetch) this is the typical usage scenario i.e., you have multiple rows of inserts/updates/deletes available now and want to send them in one go (or in batches). I've no idea if the mysql client libs support sending multiple rows of bound parameters in one go but most other database client libs do and can take advantage of DBI's execute_array/execute_for_fetch. Unfortunately few DBDs actually implement execute_array/execute_for_fetch and rely on DBI implementing it one row at a time.

Jim,
Frezik has it. That is probably the most optimal:
my $sth = $dbh->prepare( 'INSERT INTO tbl (?, ?, ?)' );
foreach(#array) { $sth->execute( #{$_} ); }
$sth->finish;

PDO Bind Value vs Insert

Besides the advantage of escaping value when using bind value in PDO, is there any difference in performance when using bind value with multiple values (prepare the statement once, but execute multiple time with different values) instead of a single insert statement like
INSERT INTO table_name VALUES (value1, value2, value3),(value1, value2, value3),(value1, value2, value3)

Did some tests myself on 100,000 records. For a simpler scenario I did not use INSERT INTO but REPLACE INTO to avoid having to come up with new keys every time.
REPLACE INTO
A raw replace into of 3 columns example REPLACE INTO table_name VALUES (value1, value2, value3),(value1, value2, value3),(value1, value2, value3)...... on 100,000 rows took approx 14sec.
NORMAL BIND
Using a preparing the statement, binding the value and executing the prepared statement took around 33 seconds
foreach ($vars as $var) {
$stmt->bindValue(':a' . $var["value1"], $var["value2"]);
$stmt->bindValue(':b' . $var["value3"], $var["value4"]);
$stmt->bindValue(':c' . $var["value5"], $var["value6"]);
$stmt->execute();
}
BIND BUT 1 EXECUTE
Creating a long statement before preparing it, binding all the paramaters and executing once took around 22 seconds
REPLACE INTO clientSettings(clientId, settingName, settingValue) VALUES (:a1,:b1,:c1)
(:a2,:b2,:c2)
(:a3,:b3,:c3)
(:a4,:b4,:c4)
.......
Note that these are rough numbers and used creating REPLACE INTO (where fields where deleted and insert) on 100,000 records.

It's faster (for MySQL) if you use prepared statements. That way, the actual SQL is parsed once and data is sent multiple times - so the actual SQL layer for transforming your INSERT INTO ... isn't invoked every time you want to execute that particular insert, it's parsed only once and then you just send different parameters (different data) that you want to insert (or execute any other operation).
So not only does it reduce overhead, it increases security as well (if you use PDO::bindValue/param due to proper escaping based on driver / charset being used).
In short - yes, your insert will be faster and safer. But by what margin - it's hard to tell.

JDBC batch insert performance

I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append("inserted "+updateCounts.length);

I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");

I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
rewriteBatchedStatements=true is the important parameter. useServerPrepStmts is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() (Source). That means that instead of sending the following n INSERT statements to the mysql server each time executeBatch() is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.

You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3); , you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3); (It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.

If:
It's a new table, or the amount to be inserted is greater then the already inserted data
There are indexes on the table
You do not need other access to the table during the insert
Then ALTER TABLE tbl_name DISABLE KEYS can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.

You may try using DDBulkLoad object.
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);

try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append("inserted "+count);

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Performance of batched statements using INSERT ... SET vs INSERT ... VALUES - mysql

Related

EF6 & MySql - Avoid selecting Id after inserting record

How to run 'SELECT FOR UPDATE' in Laravel 3 / MySQL

Perl DBI insert multiple rows using mysql native multiple insert ability

PDO Bind Value vs Insert

JDBC batch insert performance

Categories

Resources