Faster way to concurrently insert data into MySQL - mysql

I have a parallel process with about 64 children that each need to insert data into a landing table. I am currently using a MySQL MyISAM engine, and I disable keys before and after inserts.
However, this seems to be a huge bottleneck in my process. I believe MySQL is table locking for each insert and so processes are constantly waiting to write.
The inserts are independent and there is no danger of conflicting inserts. This also does not need transactions or anything of that nature.
Is there a different engine, or ways to improve the insert/write performance of MySQL?
I have thought about instantiating a table for each process, but this would make the code more complex, and that is not really my style....
Any suggestions would be greatly appreciated.
Thanks!

As documented under INSERT DELAYED Syntax:
The DELAYED option for the INSERT statement is a MySQL extension to standard SQL that is very useful if you have clients that cannot or need not wait for the INSERT to complete.
[ deletia ]
Another major benefit of using INSERT DELAYED is that inserts from many clients are bundled together and written in one block. This is much faster than performing many separate inserts.

MyISAM does indeed lock the tables when inserting, updating or deleting. InnoDB allows transaction and row-based locks.
You can also look into LOAD DATA INFILE which is faster for bulk inserts.

Related

Multiple queries in mysql to the information schema

I am using MySQL and I would like to know if I make multiple select statements simultaneously in order to get information from the information schema, how are these queries handled? Could this cause some potential database malfunction?
Since your are using the myISAM storage engine and are worrying about concurrent SELECT statements:
READ (SELECT) can happen concurrently as long as there is no WRITE (INSERT, UPDATE, DELETE or ALTER TABLE). Ie. you can have either one writer or several readers.
Otherwise the operations are queued and executed as soon as possible.
There is a special case : concurrent inserts.
Note : if you are wondering about the choice between the two main mySQL storage engines myISAM and InnoDB, InnoDB is usually a good choice, please read this SO question.

Would you expect MySql to drop records if bombed by inserts on the same table?

I don't have a testing environment for this yet. But before I think too much about solutions I'd like to know if people think this would be a problem.
I will have 10-20 java processes connected to a MySql db via JDBC. Each will be inserting unique records, all going to the same table. The rate of inserts will be on the order of 1000's per second.
My expectation is that some process will attempt to insert and encounter a table lock while another process is inserting, and this will result in a JDBC exception and that insert to fail.
Clearly if you increase the insert rate sufficiently there eventually will be a point where some buffer somewhere fills up faster than it can be emptied. When such a buffer hits its maximum capacity and can't contain any more data some of your insert statements will have to fail. This will result in an exception being thrown at the client.
However, assuming you have high-end hardware I don't imagine this should happen with 1000 inserts per second, but it does depend on the specific hardware, how much data there is per insert, how many indexes you have, what other queries are running on the system simultaneously, etc.
Regardless of whether you are doing 10 inserts per second or 1000 you still shouldn't blindly assume that every insert will succeed - there's always a chance that an insert will fail because of some network communication error or some other problem. Your application should be able to correctly handle the situation where an insert fails.
Use InnoDB as it supports reads and writes at the same time. MyISAM will usually lock the table during the insert, but give preference to SELECT statements. This can cause issues if you're doing reporting or visualization of the data while trying to do inserts.
If you have a natural primary key (no auto_increment), using it will help avoid some deadlock issues. (Newer versions have fixed this.)
http://www.mysqlperformanceblog.com/2007/09/26/innodb-auto-inc-scalability-fixed/
You might also want to see if you can queue your writes in memory and send them to the database in batches. Lots of little inserts will be much slower than doing batches in transactions.
Good presentation on getting the most out of the MySQL Connector/J JDBC driver:
http://assets.en.oreilly.com/1/event/21/Connector_J%20Performance%20Gems%20Presentation.pdf
What engine do you use? That can make a difference.
http://dev.mysql.com/doc/refman/5.5/en/concurrent-inserts.html

Is MySQL InnoDB is appropriate for this scenario?

My MysQL database contains multiple MyISAM tables, with each table containing millions of rows. There is a heavy insert load on the database, so I cannot issue SELECTs on that live database. Instead, I create a replica of the database for queries and conduct analysis on that.
For the analysis, I need to issue multiple parallel queries. The queries are independent (i.e., the results of the queries are not combined together), but they operate on same tables most of the time. As far as I know, the entire MyISAM table is locked for each query, which means parallel independent queries would be slow. Ideally, I would prefer an engine that supports "NO LOCKING". I am assuming MySQL doesnt have such an engine, so should I use InnoDB? I might be missing lot of things here. Please suggest what is the right path to take here.
Thanks
MyISAM read locks are compatible, so the SELECT queries won't lock each other.
If your analysis queries on the replica database don't write, only read, then it's OK to use MyISAM.
You could stick to MyISAM and use INSERT DELAYED:
When a client uses INSERT DELAYED, it gets an okay from the server at once, and the row is queued to be inserted when the table is not in use by any other thread.
Another major benefit of using INSERT DELAYED is that inserts from many clients are bundled together and written in one block. This is much faster than performing many separate inserts.

Fork MySQL INSERT INTO (InnoDB)

I'm trying to insert about 500 million rows of garbage data into a database for testing. Right now I have a PHP script looping through a few SELECT/INSERT statements each inside a TRANSACTION -- clearly this isn't the best solution. The tables are InnoDB (row-level locking).
I'm wondering if I (properly) fork the process, will this speed up the INSERT process? At the rate it's going, it will take 140 hours to complete. I'm concerned about two things:
If INSERT statements must acquire a write lock, then will it render forking useless, since multiple processes can't write to the same table at the same time?
I'm using SELECT...LAST_INSERT_ID() (inside a TRANSACTION). Will this logic break when multiple processes are INSERTing into the database? I could create a new database connection for each fork, so I hope this would avoid the problem.
How many processes should I be using? The queries themselves are simple, and I have a regular dual-core dev box with 2GB RAM. I set up my InnoDB to use 8 threads (innodb_thread_concurrency=8), but I'm not sure if I should be using 8 processes or if this is even a correct way to think about matching.
Thanks for your help!
The MySQL documentation has a discussion on efficient insertion of a large number of records. It seems that the clear winner is usage of the LOAD DATA INFILE command, followed by inserts that insert multiple values lists.
1) yes, there will be lock contention, but innodb is designed to handle multiple threads trying to insert. sure, they won't simultaneously insert, but it will handle serializing the inserts for you. just make sure you specifically close your transactions and you do it ASAP. this will ensure you get the best possible insert performance.
2) no, this logic will not break provided you have 1 connection per thread, since last_insert_id() is connection specific.
3) this is one of those things that you just need to benchmark to figure out. actually, i would make the program self-adjust. run 100 inserts with 8 threads and record the execution times. then try again with half as many and twice as many. whichever one is faster, then benchmark more thread count values around that number.
in general, you should always just go ahead and benchmark this kind of stuff to see which is faster. in the amount of time it takes you to think about it and write it up, you could probably already have preliminary numbers.

Efficiently inserting / updating 1000's records with mysql transaction help?

I have to insert and then keep updating 1000's of records every minute, will mysql transaction help in the speed performance as it does in sqlite3.
I carried out a test (Inserting 1300 records) but showed the same result with InnoDB (for transaction) and MyISAM.
Using MySQLdb for Python
Transactions aren't about performance; they're about data integrity and consistency.
I think that transactions will slow things down, because MySQL will have to maintain transaction logs and rollback segments. But the benefit is that it will ensure that each INSERT will keep its integrity.
InnoDB maintains referential integriry; MyISAM does not. If you observed no difference, I'm guessing that it's because you didn't have transaction boundaries set.
Try batch inserts. Can not say anything about consequent updates
I don't know if this will help for MySQL, but for large-scale Oracle system I was involved with in a previous job, the inserts/updates were done in batches of 100's or 1000's to maximize throughput. A lot of the grunt work was done server-side in PL-SQL.