EF6 & MySql - Avoid selecting Id after inserting record - mysql

I have a program that inserts thousands of records in a MySql DB. The operation cannot be done in bulk for a variety of reasons. Overall, the operations are very slow.
After looking at the SQL that is being generated, I can see that EF is calling a select to get the Id of the recently inserted record.
SET SESSION sql_mode = 'ANSI'; INSERT INTO `table`(
'blah') VALUES(
86613784);
SELECT
`id`
FROM `table`
WHERE row_count() > 0 AND `id`= last_insert_id()
Since I don't need that Id, how can I tell EF to avoid the call and save me the time?
FYI - I am already using the following statements to speed things up as well.
Configuration.ValidateOnSaveEnabled = false;
Configuration.AutoDetectChangesEnabled = false;
As requested, here is the code used to create the record. Not much to it, but if it helps...
using (var ctx = new Tc_TrademarkEntities(_entityFrameworkConnectionString))
{
ctx.case_file.Add(request.Trademark);
ctx.SaveChanges();
}

Related

.NET insert data from SQL Server to MySQL without looping through data

I'm working a .NET app where the user selects some filters like date, id, etc.
What I need to do is query a SQL Server database table with those filters, and dump them into a MySQL table. I don't need all fields in the table, only a few.
So far, I need to loop through all records in the SQL Server Dataset and insert them one by one on my MySQL table.
Is there anyway of achieving better performance? I've been playing with Dapper but cant figure out a way to do something like:
Insert into MySQLTable (a,b,c)
Select a,b,c from SQLServerTable
where a=X and b=C
Any ideas?
Linked server option is not possible because we have no access to the SQL server configuration, so looking for the most efficient way of bulk inserting data.
If I were to do this inside .NET with dapper I'd use c# and do the following;
assumptions:
a table in both tables with the same schema;
CREATE Events (
EventId int,
EventName varchar(10));
A .Net class
public class Event
{
public int EventId { get; set; }
public string EventName { get; set; }
}
The snippet below should give you something you can use as a base.
List<Event> Events = new List<Event>();
var sqlInsert = "Insert into events( EventId, EventName ) values (#EventId, #EventName)";
using (IDbConnection sqlconn = new SqlConnection(Sqlconstr))
{
sqlconn.Open();
Events = sqlconn.Query<Event>("Select * from events").ToList();
using (IDbConnection mySqlconn = new SqlConnection(Sqlconstr))
{
mySqlconn.Open();
mySqlconn.Execute(sqlInsert, Events);
}
}
The snippet above selects the rows from the events table in SQL Server and populates the Events list. normally Dapper will return an IEnumerable<>, but you are casting that ToList(). Now with the Events list, you connect to MySQL and execute the insert statements against the Events list.
This snippet is just a barebones example. Without a transaction on the Execute, each row will be autocommitted. If you add a Transaction, it will commit when all the items in the Events list are inserted.
Of course there are disadvantages doing it this way. One important thing to realize is that if you are trying to insert 1million rows from SQL to MySQL, that list will contain 1million entries which will increase the memory footprint. In those cases I'd use Dapper's Buffered = false option. This will return the 1million rows 1 row at a time. Your c# code can them enumerate over the results and add the row to a list and keep a counter. after 1000 rows have been inserted into list you can do the insert part into MySQL then clear the list and contine enumerating over the rows.
this will keep the memory footprint of your application small while processing a large number of rows.
With all that said, nothing beats bulk insert at the server level.
-HTH

How to run 'SELECT FOR UPDATE' in Laravel 3 / MySQL

I am trying to execute SELECT ... FOR UPDATE query using Laravel 3:
SELECT * from projects where id = 1 FOR UPDATE;
UPDATE projects SET money = money + 10 where id = 1;
I have tried several things for several hours now:
DB::connection()->pdo->exec($query);
and
DB::query($query)
I have also tried adding START TRANSACTION; ... COMMIT; to the query
and I tried to separate the SELECT from the UPDATE in two different parts like this:
DB::query($select);
DB::query($update);
Sometimes I get 0 rows affected, sometimes I get an error like this one:
SQLSTATE[HY000]: General error: 2014 Cannot execute queries while other unbuffered queries are active. Consider using PDOStatement::fetchAll(). Alternatively, if your code is only ever going to run against mysql, you may enable query buffering by setting the PDO::MYSQL_ATTR_USE_BUFFERED_QUERY attribute.
SQL: UPDATE `sessions` SET `last_activity` = ?, `data` = ? WHERE `id` = ?
I want to lock the row in order to update sensitive data, using Laravel's database connection.
Thanks.
In case all you need to do is increase money by 10, you don't need to lock the row before update. Simply executing the update query will do the job. The SELECT query will only slow down your script and doesn't help in this case.
UPDATE projects SET money = money + 10 where id = 1;
I would use diferent queries for sure, so you can have control on what you are doing.
I would use a transaction.
If we read this simple explanations, pdo transactions are quite straightforward. They give us this simple but complete example, that ilustrates how everithing is as we should expect (consider $db to be your DB::connection()->pdo).
try {
$db->beginTransaction();
$db->exec("SOME QUERY");
$stmt = $db->prepare("SOME OTHER QUERY?");
$stmt->execute(array($value));
$stmt = $db->prepare("YET ANOTHER QUERY??");
$stmt->execute(array($value2, $value3));
$db->commit();
}
catch(PDOException $ex) {
//Something went wrong rollback!
$db->rollBack();
echo $ex->getMessage();
}
Lets go to your real statements. For the first of them, the SELECT ..., i wouldn't use exec, but query, since as stated here
PDO::exec() does not return results from a SELECT statement. For a
SELECT statement that you only need to issue once during your program,
consider issuing PDO::query(). For a statement that you need to issue
multiple times, prepare a PDOStatement object with PDO::prepare() and
issue the statement with PDOStatement::execute().
And assign its result to some temp variable like
$result= $db->query ($select);
After this execution, i would call $result->fetchAll(), or $result->closeCursor(), since as we can read here
If you do not fetch all of the data in a result set before issuing
your next call to PDO::query(), your call may fail. Call
PDOStatement::closeCursor() to release the database resources
associated with the PDOStatement object before issuing your next call
to PDO::query().
Then you can exec the update
$result= $db->exec($update);
And after all, just in case, i would call again $result->fetchAll(), or $result->closeCursor().
If the aim is
to lock the row in order to update sensitive data, using Laravel's database connection.
Maybe you can use PDO transactions :
DB::connection()->pdo->beginTransaction();
DB::connection()->pdo->commit();
DB::connection()->pdo->rollBack();

MySQL - Fastest way to check if data in InnoDB table has changed

My application is very database intensive. Currently, I'm running MySQL 5.5.19 and using MyISAM, but I'm in the process of migrating to InnoDB. The only problem left is checksum performance.
My application does about 500-1000 "CHECKSUM TABLE" statements per second in peak times, because the clients GUI is polling the database constantly for changes (it is a monitoring system, so must be very responsive and fast).
With MyISAM, there are Live checksums that are precalculated on table modification and are VERY fast. However, there is no such thing in InnoDB. So, CHECKSUM TABLE is very slow...
I hoped to be able to check the last update time of the table, Unfortunately, this is not available in InnoDB either. I'm stuck now, because tests have shownn that the performance of the application drops drastically...
There are simply too much lines of code that update the tables, so implementing logic in the application to log table changes is out of the question...
The Database ecosystem consists of one master na 3 slaves, so local file checks is not an option.
I thought of a method to mimic a checksum cache - a lookup table with two columns - table_name, checksum, and update that table with triggers when changes in a table occurs, but i have around 100 tables to monitor and this means 3 triggers per table = 300 triggers. Hard to maintain, and i'm not sure that this wont be a performance hog again.
So is there any FAST method to detect changes in InnoDB tables?
Thanks!
The simplest way is to add a nullable column with type TIMESTAMP, with the trigger: ON UPDATE CURRENT_TIMESTAMP.
Therefore, the inserts will not change because the column accepts nulls, and you can select only new and changed columns by saying:
SELECT * FROM `table` WHERE `mdate` > '2011-12-21 12:31:22'
Every time you update a row this column will change automatically.
Here are some more informations: http://dev.mysql.com/doc/refman/5.0/en/timestamp.html
To see deleted rows simply create a trigger which is going to log every deletion to another table:
DELIMITER $$
CREATE TRIGGER MyTable_Trigger
AFTER DELETE ON MyTable
FOR EACH ROW
BEGIN
INSERT INTO MyTable_Deleted VALUES(OLD.id, NOW());
END$$
I think I've found the solution. For some time I was looking at Percona Server to replace my MySQL servers, and now i think there is a good reason for this.
Percona server introduces many new INFORMATION_SCHEMA tables like INNODB_TABLE_STATS, which isn't available in standard MySQL server.
When you do:
SELECT rows, modified FROM information_schema.innodb_table_stats WHERE table_schema='db' AND table_name='table'
You get actual row count and a counter. The Official documentation says the following about this field:
If the value of modified column exceeds “rows / 16” or 2000000000, the
statistics recalculation is done when innodb_stats_auto_update == 1.
We can estimate the oldness of the statistics by this value.
So this counter wraps every once in a while, but you can make a checksum of the number of rows and the counter, and then with every modification of the table you get a unique checksum. E.g.:
SELECT MD5(CONCAT(rows,'_',modified)) AS checksum FROM information_schema.innodb_table_stats WHERE table_schema='db' AND table_name='table';
I was going do upgrade my servers to Percona server anyway so this bounding is not an issue for me. Managing hundreds of triggers and adding fields to tables is a major pain for this application, because it's very late in development.
This is the PHP function I've come up with to make sure that tables can be checksummed whatever engine and server is used:
function checksum_table($input_tables){
if(!$input_tables) return false; // Sanity check
$tables = (is_array($input_tables)) ? $input_tables : array($input_tables); // Make $tables always an array
$where = "";
$checksum = "";
$found_tables = array();
$tables_indexed = array();
foreach($tables as $table_name){
$tables_indexed[$table_name] = true; // Indexed array for faster searching
if(strstr($table_name,".")){ // If we are passing db.table_name
$table_name_split = explode(".",$table_name);
$where .= "(table_schema='".$table_name_split[0]."' AND table_name='".$table_name_split[1]."') OR ";
}else{
$where .= "(table_schema=DATABASE() AND table_name='".$table_name."') OR ";
}
}
if($where != ""){ // Sanity check
$where = substr($where,0,-4); // Remove the last "OR"
$get_chksum = mysql_query("SELECT table_schema, table_name, rows, modified FROM information_schema.innodb_table_stats WHERE ".$where);
while($row = mysql_fetch_assoc($get_chksum)){
if($tables_indexed[$row[table_name]]){ // Not entirely foolproof, but saves some queries like "SELECT DATABASE()" to find out the current database
$found_tables[$row[table_name]] = true;
}elseif($tables_indexed[$row[table_schema].".".$row[table_name]]){
$found_tables[$row[table_schema].".".$row[table_name]] = true;
}
$checksum .= "_".$row[rows]."_".$row[modified]."_";
}
}
foreach($tables as $table_name){
if(!$found_tables[$table_name]){ // Table is not found in information_schema.innodb_table_stats (Probably not InnoDB table or not using Percona Server)
$get_chksum = mysql_query("CHECKSUM TABLE ".$table_name); // Checksuming the old-fashioned way
$chksum = mysql_fetch_assoc($get_chksum);
$checksum .= "_".$chksum[Checksum]."_";
}
}
$checksum = sprintf("%s",crc32($checksum)); // Using crc32 because it's faster than md5(). Must be returned as string to prevent PHPs signed integer problems.
return $checksum;
}
You can use it like this:
// checksum a signle table in the current db
$checksum = checksum_table("test_table");
// checksum a signle table in db other than the current
$checksum = checksum_table("other_db.test_table");
// checksum multiple tables at once. It's faster when using Percona server, because all tables are checksummed via one select.
$checksum = checksum_table(array("test_table, "other_db.test_table"));
I hope this saves some trouble to other people having the same problem.

processing data with perl - selecting for update usage with mysql

I have a table that is storing data that needs to be processed. I have id, status, data in the table. I'm currently going through and selecting id, data where status = #. I'm then doing an update immediately after the select, changing the status # so that it won't be selected again.
my program is multithreaded and sometimes I get threads that grab the same id as they are both querying the table at a relatively close time to each other, causing the grab of the same id. i looked into select for update, however, i either did the query wrong, or i'm not understanding what it is used for.
my goal is to find a way of grabbing the id, data that i need and setting the status so that no other thread tries to grab and process the same data. here is the code i tried. (i wrote it all together for show purpose here. i have my prepares set at the beginning of the program as to not do a prepare for each time it's ran, just in case anyone was concerned there)
my $select = $db->prepare("SELECT id, data FROM `TestTable` WHERE _status=4 LIMIT ? FOR UPDATE") or die $DBI::errstr;
if ($select->execute($limit))
{
while ($data = $select->fetchrow_hashref())
{
my $update_status = $db->prepare( "UPDATE `TestTable` SET _status = ?, data = ? WHERE _id=?");
$update_status->execute(10, "", $data->{_id});
push(#array_hash, $data);
}
}
when i run this, if doing multiple threads, i'll get many duplicate inserts, when trying to do an insert after i process my transaction data.
i'm not terribly familiar with mysql and the research i've done, i haven't found anything that really cleared this up for me.
thanks
As a sanity check, are you using InnoDB? MyISAM has zero transactional support, aside from faking it with full table locking.
I don't see where you're starting a transaction. MySQL's autocommit option is on by default, so starting a transaction and later committing would be necessary unless you turned off autocommit.
It looks like you simply rely on the database locking mechanisms. I googled perl dbi locking and found this:
$dbh->do("LOCK TABLES foo WRITE, bar READ");
$sth->prepare("SELECT x,y,z FROM bar");
$sth2->prepare("INSERT INTO foo SET a = ?");
while (#ary = $sth->fetchrow_array()) {
$sth2->$execute($ary[0]);
}
$sth2->finish();
$sth->finish();
$dbh->do("UNLOCK TABLES");
Not really saying GIYF as I am also fairly novice at both MySQL and DBI, but perhaps you can find other answers that way.
Another option might be as follows, and this only works if you control all the code accessing the data. You can create lock column in the table. When your code accesses the table it (pseudocode):
if row.lock != 1
row.lock = 1
read row
update row
row.lock = 0
next
else
sleep 1
redo
again though, this trusts that all users/script that access this data will agree to follow this policy. If you cannot ensure that then this won't work.
Anyway thats all the knowledge I have on the topic. Good Luck!

JDBC batch insert performance

I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?
try {
// Disable auto-commit
connection.setAutoCommit(false);
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
}
// Execute the batch
int [] updateCounts = pstmt.executeBatch();
System.out.append("inserted "+updateCounts.length);
I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.
Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");
I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.
rewriteBatchedStatements=true is the important parameter. useServerPrepStmts is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.
Now I think is the time to write how rewriteBatchedStatements=true improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() (Source). That means that instead of sending the following n INSERT statements to the mysql server each time executeBatch() is called :
INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)
It would send a single INSERT statement :
INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)
You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.
You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3); , you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3); (It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)
If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.
If:
It's a new table, or the amount to be inserted is greater then the already inserted data
There are indexes on the table
You do not need other access to the table during the insert
Then ALTER TABLE tbl_name DISABLE KEYS can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.
You may try using DDBulkLoad object.
// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
try {
// Disable auto-commit
connection.setAutoCommit(false);
int maxInsertBatch = 10000;
// Create a prepared statement
String sql = "INSERT INTO mytable (xxx), VALUES(?)";
PreparedStatement pstmt = connection.prepareStatement(sql);
Object[] vals=set.toArray();
int count = 1;
for (int i=0; i<vals.length; i++) {
pstmt.setString(1, vals[i].toString());
pstmt.addBatch();
if(count%maxInsertBatch == 0){
pstmt.executeBatch();
}
count++;
}
// Execute the batch
pstmt.executeBatch();
System.out.append("inserted "+count);