MySQL slow inserts - mysql

I have a MySql 8 database server running locally on a decent spec gaming laptop.
When I try to insert a number of records (500k) stored in an in-memory structure in Java via the following code it is extremely slow, we are talking about maybe 1000 records per minute at most.
I don't really know where to look for debug information or what metrics I should provide here to help answer this post, so if you have any guidance or additional information I can supply then please do let me know.
I've temporarily worked around this by saving the data to an in-memory H2 database but really I'd like to persist it and query it at leisure using the MySql workbench.
PreparedStatement ps = conn.prepareStatement(insertSQL);
ArrayList<String> stocks = new ArrayList<String>();
for (Spread spread : spreads) {
ps.setInt(1, spread.buyFeedId);
ps.setInt(2, spread.sellFeedId);
ps.setString(3, spread.stock);
ps.setString(4, spread.buyExchange);
ps.setString(5, spread.sellExchange);
ps.setTimestamp(6, new Timestamp(spread.spreadDateTime.toInstant().toEpochMilli()));
ps.setDouble(7, spread.buyPrice);
ps.setDouble(8, spread.sellPrice);
ps.setDouble(9, spread.diff);
ps.setInt(10, spread.askSize);
ps.setInt(11, spread.bidSize);
ps.execute();

In PHP we have PDO library allowing to actually bulk the inserts in one transaction by using:
$sql = 'my sql statement';
$Conn = DAO::getConnection();
$stmt = $Conn->prepare($sql);
$Conn->beginTransaction();
foreach($data as $row)
{
// now loop through each inner array to match bound values
foreach($row as $column => $value)
{
$stmt->bindValue(':' . $column, $value, PDOUtils::getPDOParam($value));
}
$stmt->execute();
}
$Conn->commit();
In your case with 1000+ inserts, only one transaction would be needed. I'me not java but for sure there is equivalent.

If you have all the 500K records handy and if you just want to insert, don't do insert operation line by line, but use BCP (Bulk copy program) functionality.
Read about mysqlimport or LOAD IN FILE commands and try implementing.

Batch insert the rows. That is, build a single INSERT for each 100-1000 rows. This will run about 10 times as fast. Using a "transaction" is a less effective way of batching.
Here's one discussion of Java + Batch: https://www.viralpatel.net/batch-insert-in-java-jdbc/
Search this for other possible answers on this site:
site:stackoverflow.com java MySQL batch INSERT
Another speedup is to change to
innodb_flush_log_at_trx_commit = 2
That is a speed vs reliability tradeoff -- in favor of speed.

Related

Use same mysqli prepared statement for different queries?

Throughout some testings; a little question popped up. When I usually code database updates; I usually do this via callbacks which I code in PHP; to which I simply pass a given mysqli connection object as function argument. Executing all queries of for example three queries across the same single connection proved to be much faster than if closing and reopening a DB connection for each query of a given query sequence. This also works easily with SQL transactions, the connection can be passed along to callbacks without any issues.
My question is; can you also do this with prepared statement objects ? What I mean is, considering we successfully established a $conn object, representing the mysqli connection, is stuff like this legit? :
function select_users( $users_id, $stmt ) {
$sql = "SELECT username FROM users where ID = ?";
mysqli_stmt_prepare( $stmt, $sql );
mysqli_stmt_bind_param( $stmt, "i", $users_id );
mysqli_stmt_execute( $stmt );
return mysqli_stmt_get_result( $stmt );
}
function select_labels( $artist, $stmt ) {
$sql = "SELECT label FROM labels where artist = ?";
mysqli_stmt_prepare( $stmt, $sql );
mysqli_stmt_bind_param( $stmt, "s", $artist );
mysqli_stmt_execute( $stmt );
return mysqli_stmt_get_result( $stmt );
}
$stmt = mysqli_stmt_init( $conn );
$users = select_users( 1, $stmt );
$rappers = select_labels( "rapperxyz", $stmt );
or is it bad practice; and you should rather use:
$stmt_users = mysqli_stmt_init( $conn );
$stmt_rappers = mysqli_stmt_init( $conn );
$users = select_users( 1, $stmt_users );
$rappers = select_labels( "rapperxyz", $stmt_rappers );
During the testing; I noticed that the method by using a single statement object passed along callbacks works for server calls where I call like 4 not too complicated DB queries via the 4 according callbacks in a row.
When I however do a server call with like 10 different queries, sometimes (yes, only sometimes; for pretty much the same data used across the different executions; so this seems to be weird behavior to me) I get the error "Commands out of sync; you can't run this command now" and some other weird errors I've never experienced, like the amount of variables not matching the amount of parameters; although they prefectly do after checking them all. The only way to fix this I found after some research was indeed by using different statement objects for each callback. So, I just wondered; should you actually ALWAYS use ONE prepared statement object for ONE query, which you then may execute N times in a row?
Yes.
The "commands out of sync" error is because MySQL protocol is not like http. You can't send requests any time you want. There is state on the server-side (i.e. mysqld) that is expecting a certain sequence of requests. This is what's known as a stateful protocol.
Compare with a protocol like ftp. You can do an ls in an ftp client, but the list of files you get back depends on the current working directory. If you were sharing that ftp client connection among multiple functions in your app, you don't know that another function hasn't changed the working directory. So you can't be sure the file list you get from ls represents the directory you thought you were in.
In MySQL too, there's state on the server-side. You can only have one transaction open at a time. You can only have one query executing at a time. The MySQL client does not allow you to execute a new query where there are still rows to be fetched from an in-progress query. See Commands out of sync in the MySQL doc on common errors.
So if you pass your statement handle around to some callback functions, how can that function know it's safe to execute the statement?
IMO, the only safe way to use a statement is to use it immediately.

Yii Framework - InnoDB vs MyISAM

i have a question: i have built a big application with Yii and InnoDB and came to the problem, that the insert/update durate really really long time, here is my php report:
INNODB:
admin User update 55.247464895248 seconds
ekuskov User update 13.282548904419 seconds
doriwall User update 0.002094030380249 seconds
MYISAM:
admin User update 7.8317859172821 seconds
ekuskov User update 1.6304929256439 seconds
doriwall User update 0.0020859241485596 seconds
Can anyone suggest some solution to speed up the insert/update?
EDIT ----------------------------------------------
Now i used some very simple insert loop:
public function run($args) {
$time = -microtime(true);
$begin = DateTime::createFromFormat('Y-m-d H:i:s', '2010-01-01 00:00:00');
$end = DateTime::createFromFormat('Y-m-d H:i:s', '2013-01-01 00:00:00');
$end->add(new DateInterval('P1D'));
$interval = DateInterval::createFromDateString('1 day');
$days = new DatePeriod($begin, $interval, $end);
foreach ( $days as $day ) {
echo "i";
$track = new TimeTracking();
$track->user_id = 25;
$track->date = $day->format('Y-m-d H:i:s');
$track->active = 4;
$track->save(false);
}
$time += microtime(true);
echo count($days)." items insert - $time seconds\n";
}
and now the INSERT times are following:
InnoDB: items insert - 72.269570827484 seconds
MyISAM: items insert - 0.87537479400635 seconds
[EDIT] And now i was counting time for whole SAVE method and Yii Models "save()" function:
UPDATE: model->save(false) - 0.1096498966217 seconds
UPDATE: controller save function () - 0.1302649974823 seconds
CREATE: model->save(false) - 0.052282094955444 seconds
CREATE: controller save function () - 0.057214975357056 seconds
Why just save() method takes so long?
[EDIT] I have tested save() vs command() and they durate same:
$track->save(false);
or
$command = Yii::app()->db->createCommand();
$command->insert('timeTracking', array(
'id'=>NULL,
'date'=>$track->date,
'active'=>$track->active,
'user_id'=>$track->user_id,
));
EDIT -----------------------------
And here is a statistic for inserting 1,097 Objects:
save(): 0.86-0.94,
$command->insert(): 0.67-0.72,
$command->execute(): 0.46-0.48,
mysql_query(): 0.33-0.36
FINALLY ANSWER: If you want to use some massive INSERT or UPDATE methods you should consider to create the functions with direct MYSQL Calls, there you will save almost 70% of execution time.
Regards,
Edgar
A table crawling on insert and update may indicate that you've got a bit carried away with your indexes. Remember that the DB has to stop and recompile indexes after every commit.
With Yii and InnoDB You should wrap your commands in a transaction like so:
$transaction = Yii::app()->db->beginTransaction();
try {
// TODO Loop through all of your inserts
$transaction->commit();
} catch (Exception $ex) {
// There was some type of error. Rollback the last transaction
$transaction->rollback();
}
The solution was: for those tables, where you need big inserts and quick response, convert them to MyISAM. Otherwise the user has to wait a long time and there is a threat, that the PHP Max Script Execution Time will stop your script.
InnoDB is a much more complex, feature-rich engine than MyISAM in many respects. That makes it slower and there is not much you can do in your queries, configuration, or otherwise to fix that. However, MySQL is trying to close the gap with recent updates.
Look at version 5.6:
http://dev.mysql.com/doc/refman/5.6/en/innodb-performance.html
Your best bet may be to upgrade your version of MySQL if you're behind.
InnoDB gives the option to make relations and constraints which makes the database faster and you will be able to generate models & crud with those relations.
You could also consider to break the queries into smaller ones and execute them one by one.
As MySQL specialists say, use InnoDB until you can explicitly prove that you need MyISAM.
Performance issues is not good argument.
If you have big application, you probably will face problems with table-level locks and data inconsistency. So use InnoDB.
Your performance issue may be connected to the lack of indexes or hard disk and its file system issues.
I worked with tables having hundreds of millions rows which were updated and inserted constantly from several connections. Those tables had InnoDB engine.
Of course, it you have data that should be added in a bunch, add them using one insert statement.

MySQL - Fastest way to check if data in InnoDB table has changed

My application is very database intensive. Currently, I'm running MySQL 5.5.19 and using MyISAM, but I'm in the process of migrating to InnoDB. The only problem left is checksum performance.
My application does about 500-1000 "CHECKSUM TABLE" statements per second in peak times, because the clients GUI is polling the database constantly for changes (it is a monitoring system, so must be very responsive and fast).
With MyISAM, there are Live checksums that are precalculated on table modification and are VERY fast. However, there is no such thing in InnoDB. So, CHECKSUM TABLE is very slow...
I hoped to be able to check the last update time of the table, Unfortunately, this is not available in InnoDB either. I'm stuck now, because tests have shownn that the performance of the application drops drastically...
There are simply too much lines of code that update the tables, so implementing logic in the application to log table changes is out of the question...
The Database ecosystem consists of one master na 3 slaves, so local file checks is not an option.
I thought of a method to mimic a checksum cache - a lookup table with two columns - table_name, checksum, and update that table with triggers when changes in a table occurs, but i have around 100 tables to monitor and this means 3 triggers per table = 300 triggers. Hard to maintain, and i'm not sure that this wont be a performance hog again.
So is there any FAST method to detect changes in InnoDB tables?
Thanks!
The simplest way is to add a nullable column with type TIMESTAMP, with the trigger: ON UPDATE CURRENT_TIMESTAMP.
Therefore, the inserts will not change because the column accepts nulls, and you can select only new and changed columns by saying:
SELECT * FROM `table` WHERE `mdate` > '2011-12-21 12:31:22'
Every time you update a row this column will change automatically.
Here are some more informations: http://dev.mysql.com/doc/refman/5.0/en/timestamp.html
To see deleted rows simply create a trigger which is going to log every deletion to another table:
DELIMITER $$
CREATE TRIGGER MyTable_Trigger
AFTER DELETE ON MyTable
FOR EACH ROW
BEGIN
INSERT INTO MyTable_Deleted VALUES(OLD.id, NOW());
END$$
I think I've found the solution. For some time I was looking at Percona Server to replace my MySQL servers, and now i think there is a good reason for this.
Percona server introduces many new INFORMATION_SCHEMA tables like INNODB_TABLE_STATS, which isn't available in standard MySQL server.
When you do:
SELECT rows, modified FROM information_schema.innodb_table_stats WHERE table_schema='db' AND table_name='table'
You get actual row count and a counter. The Official documentation says the following about this field:
If the value of modified column exceeds “rows / 16” or 2000000000, the
statistics recalculation is done when innodb_stats_auto_update == 1.
We can estimate the oldness of the statistics by this value.
So this counter wraps every once in a while, but you can make a checksum of the number of rows and the counter, and then with every modification of the table you get a unique checksum. E.g.:
SELECT MD5(CONCAT(rows,'_',modified)) AS checksum FROM information_schema.innodb_table_stats WHERE table_schema='db' AND table_name='table';
I was going do upgrade my servers to Percona server anyway so this bounding is not an issue for me. Managing hundreds of triggers and adding fields to tables is a major pain for this application, because it's very late in development.
This is the PHP function I've come up with to make sure that tables can be checksummed whatever engine and server is used:
function checksum_table($input_tables){
if(!$input_tables) return false; // Sanity check
$tables = (is_array($input_tables)) ? $input_tables : array($input_tables); // Make $tables always an array
$where = "";
$checksum = "";
$found_tables = array();
$tables_indexed = array();
foreach($tables as $table_name){
$tables_indexed[$table_name] = true; // Indexed array for faster searching
if(strstr($table_name,".")){ // If we are passing db.table_name
$table_name_split = explode(".",$table_name);
$where .= "(table_schema='".$table_name_split[0]."' AND table_name='".$table_name_split[1]."') OR ";
}else{
$where .= "(table_schema=DATABASE() AND table_name='".$table_name."') OR ";
}
}
if($where != ""){ // Sanity check
$where = substr($where,0,-4); // Remove the last "OR"
$get_chksum = mysql_query("SELECT table_schema, table_name, rows, modified FROM information_schema.innodb_table_stats WHERE ".$where);
while($row = mysql_fetch_assoc($get_chksum)){
if($tables_indexed[$row[table_name]]){ // Not entirely foolproof, but saves some queries like "SELECT DATABASE()" to find out the current database
$found_tables[$row[table_name]] = true;
}elseif($tables_indexed[$row[table_schema].".".$row[table_name]]){
$found_tables[$row[table_schema].".".$row[table_name]] = true;
}
$checksum .= "_".$row[rows]."_".$row[modified]."_";
}
}
foreach($tables as $table_name){
if(!$found_tables[$table_name]){ // Table is not found in information_schema.innodb_table_stats (Probably not InnoDB table or not using Percona Server)
$get_chksum = mysql_query("CHECKSUM TABLE ".$table_name); // Checksuming the old-fashioned way
$chksum = mysql_fetch_assoc($get_chksum);
$checksum .= "_".$chksum[Checksum]."_";
}
}
$checksum = sprintf("%s",crc32($checksum)); // Using crc32 because it's faster than md5(). Must be returned as string to prevent PHPs signed integer problems.
return $checksum;
}
You can use it like this:
// checksum a signle table in the current db
$checksum = checksum_table("test_table");
// checksum a signle table in db other than the current
$checksum = checksum_table("other_db.test_table");
// checksum multiple tables at once. It's faster when using Percona server, because all tables are checksummed via one select.
$checksum = checksum_table(array("test_table, "other_db.test_table"));
I hope this saves some trouble to other people having the same problem.

processing data with perl - selecting for update usage with mysql

I have a table that is storing data that needs to be processed. I have id, status, data in the table. I'm currently going through and selecting id, data where status = #. I'm then doing an update immediately after the select, changing the status # so that it won't be selected again.
my program is multithreaded and sometimes I get threads that grab the same id as they are both querying the table at a relatively close time to each other, causing the grab of the same id. i looked into select for update, however, i either did the query wrong, or i'm not understanding what it is used for.
my goal is to find a way of grabbing the id, data that i need and setting the status so that no other thread tries to grab and process the same data. here is the code i tried. (i wrote it all together for show purpose here. i have my prepares set at the beginning of the program as to not do a prepare for each time it's ran, just in case anyone was concerned there)
my $select = $db->prepare("SELECT id, data FROM `TestTable` WHERE _status=4 LIMIT ? FOR UPDATE") or die $DBI::errstr;
if ($select->execute($limit))
{
while ($data = $select->fetchrow_hashref())
{
my $update_status = $db->prepare( "UPDATE `TestTable` SET _status = ?, data = ? WHERE _id=?");
$update_status->execute(10, "", $data->{_id});
push(#array_hash, $data);
}
}
when i run this, if doing multiple threads, i'll get many duplicate inserts, when trying to do an insert after i process my transaction data.
i'm not terribly familiar with mysql and the research i've done, i haven't found anything that really cleared this up for me.
thanks
As a sanity check, are you using InnoDB? MyISAM has zero transactional support, aside from faking it with full table locking.
I don't see where you're starting a transaction. MySQL's autocommit option is on by default, so starting a transaction and later committing would be necessary unless you turned off autocommit.
It looks like you simply rely on the database locking mechanisms. I googled perl dbi locking and found this:
$dbh->do("LOCK TABLES foo WRITE, bar READ");
$sth->prepare("SELECT x,y,z FROM bar");
$sth2->prepare("INSERT INTO foo SET a = ?");
while (#ary = $sth->fetchrow_array()) {
$sth2->$execute($ary[0]);
}
$sth2->finish();
$sth->finish();
$dbh->do("UNLOCK TABLES");
Not really saying GIYF as I am also fairly novice at both MySQL and DBI, but perhaps you can find other answers that way.
Another option might be as follows, and this only works if you control all the code accessing the data. You can create lock column in the table. When your code accesses the table it (pseudocode):
if row.lock != 1
row.lock = 1
read row
update row
row.lock = 0
next
else
sleep 1
redo
again though, this trusts that all users/script that access this data will agree to follow this policy. If you cannot ensure that then this won't work.
Anyway thats all the knowledge I have on the topic. Good Luck!

How can I INSERT 1 million entries to my MySQL DB?

I want to test the speed of my SQL queries (update queries) with a real "load" on my DB. I'm relatively fresh to DB's and I am doing more complex queries than I have before, and I'm getting scared by people talking about performance like "30 seconds for 3000 records to be updated" etc. So I want to have a concrete experiment showing what my performance will be in production.
To achieve this, I want to add 10k, 100k, 1M, 10M records to my DB and then run my query.
My issue is, how can I do this? I have a "name" primary key field that must be unique and be <= 15 characters and have alphanumeric entry. The other fields I want to be the same for all created entries (i.e. a "foo" field I want to start at 10000)
If there's a way to do this and get approximately 1M entries (i.e. could be name collisions) that's fine. I'm just looking for a benchmarking dataset.
If there's a better way to benchmark my query, I'm all ears. I'm planning to simply execute and see how long the query says it takes.
Edit: It's worth noting that this is for a server and has nothing to do with "The Web" so I don't have access to PHP. I'm seeing some PHP scripts to populate, is there perhaps a way to have a perl script write out all these queries and then suck them in to the command line mysql tools?
I'm not sure of how to use just MySQL to accomplish this, but if you have access to PHP, then use this:
<?php
$start = time();
$interval = 10000000; // 10M
$con = mysql_connect( 'server', 'user', 'pass' );
mysql_select_db( 'database' );
for ( $i = 0; $i < $interval; $i++ )
{
mysql_query( 'INSERT INTO TABLE (fields) VALUES (values)', $con );
}
$endt = time();
$diff = ( $endt - $start );
print( "{$interval} queries took " . date( 'g:i:s', $diff ) . " to execute." );
?>
If you want to optimize querys you should look into the EXPLAIN statement of MySQL.
To populate your database I would suggest you write your own litte PHP script or check out this one
http://www.generatedata.com
Regarding your edit:
you could generate a big text file with perl and then use the MySQL CLI to load the file into the table, for more info please see:
http://dev.mysql.com/doc/refman/5.0/en/loading-tables.html
You just want to prepopulate your database so that you have something to run your queries against, and you are not benchmarking the initial insertion process?
In that case, just generate your input data as a tab-delimited file and use mysqlimport to populate your database.