Having a major hair-pulling issue with extremely slow inserts from Delphi 2010 to a remote MySQL 5.09 server.
So far, I have tried:
ADO using MySQL ODBC Driver
Zeoslib v7 Alpha
MyDAC
I have used batching and direct insert with ADO (using table access), and with Zeos I have used SQL insertion with a Query, then used Table direct mode and also cached updates Table mode using applyupdates and commit. With MyDAC I used table access mode, then direct SQL insert and then batched SQL insert
All technologies I have tried, I set compression on and off with no discernable difference.
So far I have seen a pretty much the same across the board 7.5 records per second!!!
Now, I would from this point assume that the remote server is just slow, but the MySQL Workbench is amazingly fast, and the Migration toolkit managed the initial migration very quickly (to be honest, I don't recall how quickly - which kind of means that it was quick)
Edit 1
It is quicker for me to write the sql to a file, upload the file to the server via ftp and then import it direct on the remote server - I wonder if they perhaps are throttling incoming MySQL traffic, but that doesn't explain why the MySQL Workbench was so quick!
Edit 2
At the most basic level, the code has been:
while not qMSSQL.EOF do
begin
qMySQL.SQL.Clear;
qMySQL.SQL.Add('INSERT INTO tablename (fieldname1) VALUES (:fieldname1)');
qMySQL.ParamByName('fieldname1').asString:=qMSSQL.FieldByName('fieldname1').asString;
qMySQL.ExecSQL;
qMSSQL.Next;
end;
I then tried
qMySQL.CachedUpdates:=true;
i:=0;
while not qMSSQL.EOF do
begin
qMySQL.SQL.Clear;
qMySQL.SQL.Add('INSERT INTO tablename (fieldname1) VALUES (:fieldname1)');
qMySQL.ParamByName('fieldname1').asString:=qMSSQL.FieldByName('fieldname1').asString;
qMySQL.ExecSQL;
inc(i);
if i>100 then
begin
qMySQL.ApplyUpdates;
i:=0;
end;
qMSSQL.Next;
end;
qMySQL.ApplyUpdates;
Now, in this code with CachedUpdates:=False (which obviously never actually wrote back to the database) the speed was blisteringly fast!!
To be perfectly honest, I think it's the connection - I feel it's the connection... Just waiting for them to get back to me!
Thanks for all your help!
You can try AnyDAC and it Array DML feature. It may speedup a standard SQL INSERT for few times.
Sorry that this reply comes long after you asked the question.
I had a similar problem. BDS2006 to MySQL via ODBC across the network - took 25 minutes to run - around 25 inserts per second. I was using a TDatabase connection and attached the TTable Tquery to it. Prepared the SQL statements.
The major improvement was when I started starting transactions within the loop. A simple example, Memebrships have Member Period. Start a transaction before the insert of the Membership and Members, Commit after. The number of memberships was 01585 and before transactions it took 279.90 seconds to process all the Membership records but after it took 6.71 seconds.
Almost too good to believe and am still working through fixing the code for the other slow bits.
Maybe Mark you have solved your problem but it may help someone else.
Are you using query parameters? The fastest way to insert should be using plain queries and parameters (i.e. INSERT INTO table (field) VALUES (:field) ), preparing the query and then assigning parameters and executing as many times as required within a single transaction - committing at the end (don't use any flavour of autocommit)
That in most databases avoids hard parses each time the query is executed, which requires time. Parameters allow the query to be parsed only once, and then re-executed many times as needed.
Use the server facilites to check what's going on - many offer a way to inspect what running statements are doing.
I'm not sure about ZeosLib, but using ADO with ODBC driver, you will not get the fastest way to insert the records, here few step that may make your insertion faster:
Use Mydac for direct access, they work without the slow ODBC > ADO > OLEDB > MySqlLib to connect to Mysql.
Open the connection at first before the insertion.
if you have large insertion such as 1000 or more, try use transaction and commit after 100 record or more depend on number of records.
Point 3 may makes your insertion faster even with ZeosLib or ADO.
You've got two separate things going on here. First, your Delphi program is creating Insert statements and sending them to the DB server, and then the server is handling them. You need to examine both ends to find the bottleneck. I'm not to familiar with MySql tools, but I bet you could find a SQL profiler for it easily enough. Use it to profile your inserts from the Delphi app, and compare it to running inserts from the Workbench tool and see if there's a significant difference.
If not, then the slowdown is in your app. Try hooking it up to Sampling Profiler or some other profiling tool that understands Delphi, and it'l show you where you're spending lots of time on. Once you know that, then you can work on attacking the problem, or maybe come back here to ask a more specific question. But until you know where the problem is coming from, any answers you get here are just gonna be educated guesses at best.
Related
I have a MySQL database where everyday I make a new table and I end-up with around 5M rows in the day end. My problem is during the peak hour our reporting gets real slow. We are trying to figure out the cause of the issue. One reason can be, the select queries can not execute properly as our system keeps inserting frequently during the peak hour. From MySQL website I came to know that concurrent insert can be a solution of this problem.
Therefore, how can I enable concurrent inserts in my system and check if it is working? And what issues I may face if I use it?
I am using MySQL server 5.0.95
To enable concurrent insert, add "concurrent_insert=2" as server system variables.
I have a routine in MySQL that is very long and has multiple SELECT, INSERT, and UPDATE statements in it with some IFs and REPEATs. It's been running fine until lately, where it's hanging an taking over 20 seconds to complete (which is unacceptable considering it used to take 1 second or so).
What is the quickest and easiest way for me to find out where in the routine the bottleneck is coming from? Basically the routine is getting stopped up and some point... how can I find out where that is without breaking apart the routine and testing one-by-one each section?
If you use Percona Server (a free distribution of MySQL with many enhancements), you can make the slow-query log record times for individual queries, using the log_slow_sp_statements configuration variable. See http://www.percona.com/doc/percona-server/5.5/diagnostics/slow_extended_55.html
If you're using stock MySQL, you can add statements in the stored procedure to set a series of session variables to the value returned by the SYSDATE() function. Use a different session variable at different points in the SP. Then after you run the SP in a test execution, you can inspect the values of these session variables to see what section of the SP took the longest.
To analyze the query can see the execution plan of the same. It is not always an easy task but with a bit of reading will find the solution. I leave some useful links
http://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html
http://dev.mysql.com/doc/refman/5.0/en/explain.html
http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
http://www.lornajane.net/posts/2011/explaining-mysqls-explain
I currently have a PostgreSQL database, because one of the pieces of software we're using only supports this particular database engine. I then have a query which summarizes and splits the data from the app into a more useful format.
In my MySQL database, I have a table which contains an identical schema to the output of the query described above.
What I would like to develop is an hourly cron job which will run the query against the PostgreSQL database, then insert the results into the MySQL database. During the hour period, I don't expect to ever see more than 10,000 new rows (and that's a stretch) which would need to be transferred.
Both databases are on separate physical servers, continents apart from one another. The MySQL instance runs on Amazon RDS - so we don't have a lot of control over the machine itself. The PostgreSQL instance runs on a VM on one of our servers, giving us complete control.
The duplication is, unfortunately, necessary because the PostgreSQL database only acts as a collector for the information, while the MySQL database has an application running on it which needs the data. For simplicity, we're wanting to do the move/merge and delete from PostgreSQL hourly to keep things clean.
To be clear - I'm a network/sysadmin guy - not a DBA. I don't really understand all of the intricacies necessary in converting one format to the other. What I do know is that the data being transferred consists of 1xVARCHAR, 1xDATETIME and 6xBIGINT columns.
The closest guess I have for an approach is to use some scripting language to make the query, convert results into an internal data structure, then split it back out to MySQL again.
In doing so, are there any particular good or bad practices I should be wary of when writing the script? Or - any documentation that I should look at which might be useful for doing this kind of conversion? I've found plenty of scheduling jobs which look very manageable and well-documented, but the ongoing nature of this script (hourly run) seems less common and/or less documented.
Open to any suggestions.
Use the same database system on both ends and use replication
If your remote end was also PostgreSQL, you could use streaming replication with hot standby to keep the remote end in sync with the local one transparently and automatically.
If the local end and remote end were both MySQL, you could do something similar using MySQL's various replication features like binlog replication.
Sync using an external script
There's nothing wrong with using an external script. In fact, even if you use DBI-Link or similar (see below) you probably have to use an external script (or psql) from a cron job to initiate repliation, unless you're going to use PgAgent to do it.
Either accumulate rows in a queue table maintained by a trigger procedure, or make sure you can write a query that always reliably selects only the new rows. Then connect to the target database and INSERT the new rows.
If the rows to be copied are too big to comfortably fit in memory you can use a cursor and read the rows with FETCH, which can be helpful if the rows to be copied are too big to comfortably fit in memory.
I'd do the work in this order:
Connect to PostgreSQL
Connect to MySQL
Begin a PostgreSQL transaction
Begin a MySQL transaction. If your MySQL is using MyISAM, go and fix it now.
Read the rows from PostgreSQL, possibly via a cursor or with DELETE FROM queue_table RETURNING *
Insert them into MySQL
DELETE any rows from the queue table in PostgreSQL if you haven't already.
COMMIT the MySQL transaction.
If the MySQL COMMIT succeeded, COMMIT the PostgreSQL transaction. If it failed, ROLLBACK the PostgreSQL transaction and try the whole thing again.
The PostgreSQL COMMIT is incredibly unlikely to fail because it's a local database, but if you need perfect reliability you can use two-phase commit on the PostgreSQL side, where you:
PREPARE TRANSACTION in PostgreSQL
COMMIT in MySQL
then either COMMIT PREPARED or ROLLBACK PREPARED in PostgreSQL depending on the outcome of the MySQL commit.
This is likely too complicated for your needs, but is the only way to be totally sure the change happens on both databases or neither, never just one.
BTW, seriously, if your MySQL is using MyISAM table storage, you should probably remedy that. It's vulnerable to data loss on crash, and it can't be transactionally updated. Convert to InnoDB.
Use DBI-Link in PostgreSQL
Maybe it's because I'm comfortable with PostgreSQL, but I'd do this using a PostgreSQL function that used DBI-link via PL/Perlu to do the job.
When replication should take place, I'd run a PL/PgSQL or PL/Perl procedure that uses DBI-Link to connect to the MySQL database and insert the data in the queue table.
Many examples exist for DBI-Link, so I won't repeat them here. This is a common use case.
Use a trigger to queue changes and DBI-link to sync
If you only want to copy new rows and your table is append-only, you could write a trigger procedure that appends all newly INSERTed rows into a separate queue table with the same definition as the main table. When you want to sync, your sync procedure can then in a single transaction LOCK TABLE the_queue_table IN EXCLUSIVE MODE;, copy the data, and DELETE FROM the_queue_table;. This guarantees that no rows will be lost, though it only works for INSERT-only tables. Handling UPDATE and DELETE on the target table is possible, but much more complicated.
Add MySQL to PostgreSQL with a foreign data wrapper
Alternately, for PostgreSQL 9.1 and above, I might consider using the MySQL Foreign Data Wrapper, ODBC FDW or JDBC FDW to allow PostgreSQL to see the remote MySQL table as if it were a local table. Then I could just use a writable CTE to copy the data.
WITH moved_rows AS (
DELETE FROM queue_table RETURNING *
)
INSERT INTO mysql_table
SELECT * FROM moved_rows;
In short you have two scenarios:
1) Make destination pull the data from source into its own structure
2) Make source push out the data from its structure to destination
I'd rather try the second one, look around and find a way to create postgresql trigger or some special "virtual" table, or maybe pl/pgsql function - then instead of external script, you'll be able to execute the procedure by executing some query from cron, or possibly from inside postgres, there are some possibilities of operation scheduling.
I'd choose 2nd scenario, because postgres is much more flexible, and manipulating data some special, DIY ways - you will simply have more possibilities.
External script probably isn't a good solution, e.g. because you will need to treat binary data with special care, or convert dates× from DATE to VARCHAR and then to DATE again. Inside external script, various text-stored data will be probably just strings, and you will need to quote it too.
Straight to the Qeustion ->.
The problem : To do async bulk inserts (not necessary bulk, if MySql can Handle it) using Node.js (coming form a .NET and PHP background)
Example :
Assume i have 40(adjustable) functions doing some work(async) and each adding a record in the Table after its single iteration, now it is very probable that at the same time more than one function makes an insertion call. Can MySql handle it that ways directly?, considering there is going to be an Auto-update field.
In C#(.NET) i would have used a dataTable to contain all the rows from each function and in the end bulk-insert the dataTable into the database Table. and launch many threads for each function.
What approach will you suggest in this case,
Shall the approach change in case i need to handle 10,000 or 4 million rows per table?
ALso The DB schema is not going to change, will MongoDB be a better choice for this?
I am new to Node, NoSql and in the noob learning phase at the moment. So if you can provide some explanation to your answer, it would be awesome.
Thanks.
EDIT :
Answer : Neither MySql or MongoDB support any sort of Bulk insert, under the hood it is just a foreach loop.
Both of them are capable of handling a large number of connections simultanously, the performance will largely depend on you requirement and production environment.
1) in MySql queries are executed sequentially per connection. If you are using one connection, your 40~ functions will result in 40 queries enqueued (via explicit queue in mysql library, your code or system queue based on syncronisation primitives), not necessarily in the same order you started 40 functions. MySQL won't have any race conditions problems with auto-update fields in that case
2) if you really want to execute 40 queries in parallel you need to open 40 connections to MySQL (which is not a good idea from performance point of view, but again, Mysql is designed to handle auto-increments correctly for multiple clients)
3) There is no special bulk insert command in the Mysql protocol on the wire level, any library exposing bulk insert api in fact just doing long 'insert ... values' query.
SETUP
I have to insert a couple million rows in either SQL Server 2000/2005, MySQL, or Access. Unfortunately I don't have an easy way to use bulk insert or BCP or any of the other ways that a normal human would go about this. The inserts will happen on one particular database but that code needs to be db agnostic -- so I can't do bulk copy, or SELECT INTO, or BCP. I can however run specific queries before and after the inserts, depending on which database I'm importing to.
eg.
If IsSqlServer() Then
DisableTransactionLogging();
ElseIf IsMySQL() Then
DisableMySQLIndices();
End If
... do inserts ...
If IsSqlServer() Then
EnableTransactionLogging();
ElseIf IsMySQL() Then
EnableMySQLIndices();
End If
QUESTION
Are there any interesting things I can do to SQL Server that might speed up these inserts?
For example, is there a command I could issue to tell SQL Server, "Hey, don't bother recording these transactions in the transaction log".
Or maybe I could say, "Hey, I have a million rows coming in, so don't update your index until I'm totally finished".
ALTER INDEX [IX_TableIndex] ON Table DISABLE
... inserts
ALTER INDEX [IX_TableIndex] ON Table REBUILD
(Note: Above index disable only works on 2005, not 2000. Bonus points if you know a way to do this on 2000).
What about MySQL, and Access?
The single biggest thing that will kill performance here is the fact that (it sounds like) you're executing a million different INSERTs against the DB. Each INSERT is treated as a single operation. If you can do this as a single operation, then you will almost certainly have a huge performance improvement.
Both MySQL and SQL Server support 'selects' of constant expressions without a table name, so this should work as one statement:
INSERT INTO MyTable(ID, name)
SELECT 1, 'Fred'
UNION ALL SELECT 2, 'Wilma'
UNION ALL SELECT 3, 'Barney'
UNION ALL SELECT 4, 'Betty'
It's not clear to me if Access supports that, not having Access available. HOWEVER, Access does support constants in a SELECT, as far as I can tell, and you can coerce the above into ANSI SQL-92 (which should be supported by all 3 engines; it's about as close to 'DB agnostic' as you'll get) by just adding
FROM OneRowTable
to the end of every individual SELECT, where 'OneRowTable' is a table with just one row of dummy data.
This should let you insert a million rows of data in much much less than a million INSERT statements -- and things like index reshuffling will be done once, rather than a million times. You may have much less need for other optimisations after that.
is this a regular process or a one time event?
I have, in the past, just scripted out the current indexes, dropped them, inserted the rows, then just re-add the indexes.
The SQL Management Studio can script out the indexes from the right click menus...
For SQL Server:
You can set the recovery model to "Simple", so your transaction log will be kept small. Do not forget to set back afterwards.
Disabling the indexes is actually a good idea. This will work on SQL 2005, not on SQL Server 2000.
alter index [INDEX_NAME] on [TABLE_NAME] disable
And to enable
alter index [INDEX_NAME] on [TABLE_NAME] rebuild
And then just insert the rows one by one. You have to be patient, but at least it is somewhat faster.
If it is a one-time thing (or it happens often enough to justify automating this), also considering dropping/disabling all indexes, and then adding/reenabling them again when the insert it done
The trouble with setting the recovery model to simple is that it affects any other users entering data at the same time and thus will amke thier changes unrecoverable.
Samre thing with disabling the indexes, this disables for everyone and may make the database run slower than a slug.
Suggest you run the import in batches.
If this is not something that needs to be read terribly quickly, you can do an "Insert Delayed" into the table on MySQL. This allows your code to continue running without having to wait for the insert to actually happen. This does have some limitations, but if your primary concern is to get the program to finish quickly, this may help. Be warned that there is a nice long list of situations where this may not act as expected. Check the docs.
I do not know if this functionality works for Access or MS SQL, though.
Have you considered using the Factory pattern? I'm guessing you're writing the code for this, so if using the factory pattern you could code up a factory that returned a concrete "IDataInserter" type class that would do the work for.
This would still allow you to be data agnostic and get the fastest method for each type of database.
SQL Server 2000/2005, MySQL, and Access can all load directly from a tab / cr text file they just have different commands to do it. If you've got the case statement to determine which DB you're importing into just figure out their preference for importing a text file.
Can you use DTS (2000) or SSIS (2005) to build a package to do this? DTS and SSIS can both pull from the same source and pipe out to the different potential destinations. Go for SSIS if you can. There's a lot of good, fast technology in there along with functionality to embed the IsSQLServer, IsMySQL, etc. logic.
It's worth considering breaking your inserts into smaller batches; a single transaction with lots of queries will be slow.
You might consider using SQL's bulk-logged recovery model during your bulk insert.
http://msdn.microsoft.com/en-us/library/ms190422(SQL.90).aspx
http://msdn.microsoft.com/en-us/library/ms190203(SQL.90).aspx
You might also disable the indexes on the target table during your inserts.