Innodb Performance Optimization - mysql

One of the portion of my site requires bulk insert, it takes around 40 mins for innodb to load that file into database. I have been digging around the web and found few things.
innodb_autoinc_lock_mode=2 (It wont generate consecutive keys)
UNIQUE_CHECKS=0; (disable unique key checks)
FOREIGN_KEY_CHECKS=0 (disable foreign key checks)
--log_bin=OFF turn off binary log used for replication
Problem
I want to set first 3 options for just one session i.e. during bulk insert. The first option does not work mysql says unknown system variable 'innodb_autoinc_lock_mode'. I am using MySQL 5.0.4
The last option, I would like to turn it off but I am wondering what if I need replication later will it just start working if I turn it on again?
Suggestions
Any other suggestions how to improve bulk inserts/updates for innodb engine? Or please comments on my findings.
Thanks

Assuming you are loading the data in a single or few transactions, most of the time is likely to be spent building indexes (depending on the schema of the table).
Do not do a large number of small inserts with autocommit enabled, that will destroy performance with syncs for each commit.
If your table is bigger (or nearly as big as) the innodb buffer pool you are in trouble; a table which can't fit in ram with its indexes cannot be inserted into efficiently, as it will have to do READS to insert. This is so that existing index blocks can be updated.
Remember that disc writes are ok (they are mostly sequential, and you have a battery-backed raid controller, right?), but reads are slow and need to be avoided.
In summary
Do the insert in a small number of big-ish transactions, say 10k-100k rows or each. Don't make the transactions too big or you'll exhaust the logs.
Get enough ram that your table fits in memory; set the innodb buffer pool appropriately (You are running x86_64, right?)
Don't worry about the operation taking a long time, as due to MVCC, your app will be able to operate on the previous versions of the rows assuming it's only reading.
Don't make any of the optimisations listed above, they're probably waste of time (don't take my word for it - benchmark the operation on a test system in your lab with/without those).
Turning unique checks off is actively dangerous as you'll end up with broken data.

To answer the last part of your question, no it won't just start working again; if the inserts are not replicated but subsequent updates are, the result will not be a pretty sight. Disabling foreign and unique keys should be OK, provided you re-enable them afterwards, and deal with any constraint violations.
How often do you have to do this? Can you load smaller datasets more frequently?

Related

Which engine to be used for more than 100 insert query per second

Which engine to be used for more than 100 insert query per second
I read differences and pros and cons of MYISAM and Innodb.
But i am still confused for 100+ insert query in a table (basically for tracking purpose) which db should i use.
I refered What's the difference between MyISAM and InnoDB?
Based on my understanding, for each insert MYISAM will lock table and hence innodb should be used for row locking.
But on the otherhand performance of MYISAM are 100times better.So what should be the optimal and correct selection and why?
Simple code that does one-row INSERTs without any tuning maxes out at about 100 rows per second in any engine, especially InnoDB.
But, it is possible to get 1000 rows per second or even more.
The quick fix for InnoDB is to set innodb_flush_log_at_trx_commit = 2; that will uncork the main thing stopping InnoDB at 100 inserts/second using a commodity spinning disk. Setting innodb_buffer_pool_size to about 70% of available RAM is also important.
If a user is inserting multiple rows into the same table at the same time, then LOAD DATA or a batch Insert (INSERT ... VALUES (...), (...), ...) of 100 rows or more will insert ten times as fast. This applies to any Engine.
MyISAM is not 100 times as fast; it is not even 10 times as fast as InnoDB. Today (5.6 or newer), you would be hard pressed to find a well tuned application that is more than a little faster in MyISAM. You are, or will be, I/O-limited.
As for corruption -- No engine suffers from corruption except during a crash. A power failure may mangle MyISAM indexes, usually recoverably. Moreover, a batch insert could be half done. InnoDB will be clean -- the entire batch is done or none of it is done; no corruption.
ARCHIVE saves disk space, but costs CPU.
MEMORY is often faster because it has no I/O. But you have too much data for that Engine, correct?
MariaDB with TokuDB can probably run faster than anything I describe here; but you have not indicated the need for it.
100 rows inserted per second = 8M/day = 3 Billion/year. Will you be purging the data eventually? Will you be querying the data? Purging: Let's talk about PARTITION. Querying: Let's talk about Summary Tables.
Indexing: Minimize the number of indexes. If you have a 'random' index, such as a UUID, and you have a billion rows, you will be stuck with 100 rows/second, regardless of which Engine and regardless of any tuning. Do I need to explain further?
If this is a queuing system, I say "Don't queue it, just do it."
Bottom line: Use a InnoDB. Tune it. Use batch inserts. Avoid random indexes. etc.
You are correct that MyISAM is a faster choice if your operational use case is lots of insertions. But that answer can change drastically based on the kind of use you make of the data. If this is an archival application you might consider the ARCHIVE storage engine. It is best for write-once, read-rarely applications.
You should investigate INSERT DELAYED as it will allow your client programs to fire-and-forget these inserts rather than waiting for completion. This burns RAM in your mysqld process, though. If that style of operation meets your needs, this is a compelling reason to go with MyISAM.
Beware indexes in the target table of your inserts. Maintaining indexes is a big part of the server's insert workload.
Don't forget to look into MariaDB. It's a compatible fork of MySQL with some more advanced storage engines and features.
I have experience with a similar application. In our case, the application scaled up beyond the original insert rate, and the server could not keep up.(It's always good when an application workload grows!) We ended up doing two things, one after the other.
Using a message queuing system, and running just a couple of processes to actually do the inserts. The original clients wrote their logging records to the message queue rather than directly to the database. (Amazon AWS's SQS is an example of such a queuing system).
reworking the insert process to use LOAD DATA INFILE to load great gobs of log rows at once.
(You probably have figured out that this kind of workload isn't feasible on a cheap shared hosting service or an AWS micro instance.)

MySQL Memory Engine vs InnoDB on RAMdisk

I'm writing a bit of software that needs to flatten data from a hierarchical type of format into tabular format. Instead of doing it all in a programming language every time and serving it up, I want to cache the results for a few seconds, and use SQL to sort and filter. When in use, we're talking 400,000 writes and 1 or 2 reads over the course of those few seconds.
Each table will contain 3 to 15 columns. Each row will contain from 100 bytes to 2,000 bytes of data, although it's possible that in some cases, some rows may get up to 15,000 bytes. I can clip data if necessary to keep things sane.
The main options I'm considering are:
MySQL's Memory engine
A good option, almost specifically written for my use case! But.. "MEMORY tables use a fixed-length row-storage format. Variable-length types such as VARCHAR are stored using a fixed length. MEMORY tables cannot contain BLOB or TEXT columns." - Unfortunately, I do have text fields with a length up to maybe 10,000 characters - and even that is a number that is not specifically limited. I could adjust the varchar length based on the max length of text columns as I loop through doing my flattening, but that's not totally elegant. Also, for my occasional 15,000 character row, does that mean I need to allocate 15,000 characters for every row in the database? If there was 100,000 rows, that's 1.3 gb not including overhead!
InnoDB on RAMDisk
This is meant to run on the cloud, and I could easily spin up a server with 16gb of ram, configure MySQL to write to tmpfs and use full featured MySQL. My concern for this is space. While I'm sure engineers have written the memory engine to prevent consuming all temp storage and crashing the server, I doubt this solution would know when to stop. How much actual space will my 2,000 bytes of data consume when in database format? How can I monitor it?
Bonus Questions
Indexes
I will in fact know in advance which columns need to be filtered and sorted by. I could set up an index before I do inserts, but what kind of performance gain could I honestly expect on top of a ram disk? How much extra overhead to indexes add?
Inserts
I'm assuming inserting multiple rows with one query is faster. But the one query, or series of large queries are stored in memory, and we're writing to memory, so if I did that I'd momentarily need double the memory. So then we talk about doing one or two or a hundred at a time, and having to wait for that to complete before processing more.. InnoDB doesn't lock the table but I worry about sending two queries too close to each other and confusing MySQL. Is this a valid concern? With the MEMORY engine I'd have to definitely wait for completion, due to table locks.
Temporary
Are there any benefits to temporary tables other than the fact that they're deleted when the db connection closes?
I suggest you use MyISAM. Create your table with appropriate indexes for your query. Then disable keys, load the table, and enable keys.
I suggest you develop a discipline like this for your system. I've used a similar discipline very effectively.
Keep two copies of the table. Call one table_active and the second one table_loading.
When it's time to load a new copy of your data, use commands like this.
ALTER TABLE table_loading DISABLE KEYS;
/* do your insertions here, to table_loading */
/* consider using LOAD DATA INFILE if it makes sense. */
ALTER TABLE table_loading ENABLE KEYS; /* this will take a while */
/* at this point, suspend your software that's reading table_active */
RENAME TABLE table_active TO table_old;
RENAME TABLE table_loading TO table_active;
/* now you can resume running your software */
TRUNCATE TABLE table_old;
RENAME TABLE table_old TO table_loading;
Alternatively, you can DROP TABLE table_old; and create a new table for table_loading instead of the last rename.
This two-table (double-buffered) strategy should work pretty well. It will create some latency because your software that's reading the table will work on an old copy. But you'll avoid reading from an incompletely loaded table.
I suggest MyISAM because you won't run out of RAM and blow up and you won't have the fixed-row-length overhead or the transaction overhead. But you might also consider MariaDB and the Aria storage engine, which does a good job of exploiting RAM buffers.
If you do use the MEMORY storage engine, be sure to tweak your max_heap_table_size system variable. If your read queries will use index range scans (sequential index access) be sure to specify BTREE style indexes. See here: http://dev.mysql.com/doc/refman/5.1/en/memory-storage-engine.html

MySQL: Why is DELETE more CPU intensive than INSERT?

I'm currently taking the course "Performance Evaluation" at university, and we're now doing an assignment where we are testing the CPU usage on a PHP and MySQL-database server. We use httperf to create custom traffic, and vmstat to track the server load. We are running 3000 connections to the PHP-server, for both INSERT and DELETE (run separately).
Numbers show that the DELETE operation is a lot more CPU intensive than INSERT — and I'm just wondering why?
I initially thought INSERT required more CPU usage, as indexes would need to be recreated, data needed to be written to disk, etc. But obviously I'm wrong, and I'm wondering if anyone can tell me the technical reason for this.
At least with InnoDB (and I hope they have you on this), you have more operations even with no foreign keys. An insert is roughly this:
Insert row
Mark in binary log buffer
Mark commit
Deletions do the following:
Mark row removed (taking the same hit as an insertion -- page is rewritten)
Mark in binary log buffer
Mark committed
Actually go remove the row, (taking the same hit as an insertion -- page is rewritten)
Purge thread tracks deletions in binary log buffer too.
For that, you've got twice the work going on to delete rather than insert. A delete requires those two writes because it must be marked as removed for all versions going forward, but can only be removed when no transactions remain which see it. Because InnoDB only writes full blocks, to the disk, the modification penalty for a block is constant.
DELETE also requires data to be written to disk, plus recalculation of indexes, and in addition, a set of logical comparisons to find the record(s) you are trying to delete in the first place.
Delete requires more logic than you think; how much so depends on the structure of the schema.
In almost all cases, when deleting a record, the server must check for any dependencies upon that record as a foreign key reference. That, in a nutshell, is a query of the system tables looking for table definitions with a foreign key ref to this table, then a select of each of those tables for records referencing the record to be deleted. Right there you've increased the computational time by a couple orders of magnitude, regardless of whether the server does cascading deletes or just throws back an error.
Self-balancing internal data structures would also have to be reorganized, and indexes would have to be updated to remove any now-empty branches of the index trees, but these would have counterparts in the Insert operations.

Will a MySQL table with 20,000,000 records be fast with concurrent access?

I ran a lookup test against an indexed MySQL table containing 20,000,000 records, and according to my results, it takes 0.004 seconds to retrieve a record given an id--even when joining against another table containing 4,000 records. This was on a 3GHz dual-core machine, with only one user (me) accessing the database. Writes were also fast, as this table took under ten minutes to create all 20,000,000 records.
Assuming my test was accurate, can I expect performance to be as as snappy on a production server, with, say, 200 users concurrently reading from and writing to this table?
I assume InnoDB would be best?
That depends on the storage engine you're going to use and what's the read/write ratio.
InnoDB will be better if there are lot of writes. If it's reads with very occasional write, MyISAM might be faster. MyISAM uses table level locking, so it locks up whole table whenever you need to update. InnoDB uses row level locking, so you can have concurrent updates on different rows.
InnoDB is definitely safer, so I'd stick with it anyhow.
BTW. remember that right now RAM is very cheap, so buy a lot.
Depends on any number of factors:
Server hardware (Especially RAM)
Server configuration
Data size
Number of indexes and index size
Storage engine
Writer/reader ratio
I wouldn't expect it to scale that well. More importantly, this kind of thing is to important to speculate about. Benchmark it and see for yourself.
Regarding storage engine, I wouldn't dare to use anything but InnoDB for a table of that size that is both read and written to. If you run any write query that isn't a primitive insert or single row update you'll end up locking the table using MyISAM, which yields terrible performance as a result.
There's no reason that MySql couldn't handle that kind of load without any significant issues. There are a number of other variables involved though (otherwise, it's a 'how long is a piece of string' question). Personally, I've had a number of tables in various databases that are well beyond that range.
How large is each record (on average)
How much RAM does the database server have - and how much is allocated to the various configurations of Mysql/InnoDB.
A default configuration may only allow for a default 8MB buffer between disk and client (which might work fine for a single user) - but trying to fit a 6GB+ database through that is doomed to failure. That problem was real btw - and was causing several crashes a day of a database/website till I was brought in to trouble-shoot it.
If you are likely to do a great deal more with that database, I'd recommend getting someone with a little more experience, or at least oing what you can to be able to give it some optimisations. Reading 'High Performance MySQL, 2nd Edition' is a good start, as is looking at some tools like Maatkit.
As long as your schema design and DAL are constructed well enough, you understand query optimization inside out, can adjust all the server configuration settings at a professional level, and have "enough" hardware properly configured, yes (except for sufficiently pathological cases).
Same answer both engines.
You should probably perform a load test to verify, but as long as the index was created properly (meaning indexes are optimized to your query statements), the SELECT queries should perform at an acceptable speed (the INSERTS and/or UPDATES may be more of a speed issue though depending on how many indexes you have, and how large the indexes get).

Are there any pitfalls / things you need to know when changing from MyISAM to InnoDB

One of my projects use the MyISAM engine in MySQL, but I'm considering changing it to InnoDB as I need transaction support here and there.
What should I look at or consider before doing this?
Can I just change the engine, or should the data be prepared for it?
Yes absolutely, there are many things, you should test your application extremely thoroughly:
Transactions can deadlock and need to be repeated. This is the case (in some circumstances) even with an autocommitted transaction which only inserts one row.
Disc usage will almost certainly increase
I/O load during writes will almost certainly increase
Behaviour of indexing will change because InnoDB uses clustered indexes - this may be a beneficial effect in some cases
Your backup strategy will be impacted. Consider this carefully.
The migration process itself will need to be carefully planned, as it will take a long time if you have a lot of data (during which time the data will be either readonly, or completely unavailable - do check!)
There is one big caveat. If you get any kind of hardware failure (or similar) during a write, InnoDB will corrupt tables.
MyISAM will also, but a mysqlcheck --auto-repair will repair them. Trying this with InnoDB tables will fail. Yes, this is from experience.
This means you need to have a good regular data backup plan to use InnoDB.
Some other notes:
InnoDB does not reallocate free space on the filesystem after you drop a table/database or delete a record, this can be solved by "dumping and importing" or setting innodb_file_per_table=1 in my.cnf.
Adding/removing indexes on a large InnoDB table can be quite painfull, because it locks the current table, creates a temporary one with your altered indexes and inserts data - row by row. There is a plugin from Innobase, but it works only for MySQL 5.1
InnoDB is also MUCH MORE memory intense, I suggest you to have as large innodb_buffer_pool_size variable as your server memory allows (70-80% should be a safe bet). If your server is UNIX/Linux, consider reducing sysctl variable vm.swappiness to 0 and use innodb_flush_method=O_DIRECT to avoid double buffering. Always test if you hit swap when toggling those values.You can always read more at Percona blog, which is great.
Also, you can run mysqlbackup with --single-transaction --skip-lock-tables and have no table locks while the backup is commencing.
In any case, InnoDB is great, do not let some pitfalls discourage you.
Just altering the table and setting the engine should be fine.
One of the big ones to watch out for is that select count(*) from MyTable is much slower in InnoDB than MyISAM.
auto_increment values will reset to the highest value in the table +1 after a server restart -- this can cause funny problems if you have a messy db with some deletes.
Optimum server settings are going to be different to a mainly MyISAM db.
Make sure the size of the innodb file is big enough to hold all your data or you'll be crucified by constant reallocation when you change the engines of the tables.
If you are intending to use InnoDB as a way to get concurrent queries, then you will want to set innodb_file_trx_commit=1 so you get some performance back. OTOH, if you were looking to re-code your application to be transaction aware, then deciding this setting will be part of the general performance review needed of the InnoDB settings.
The other major thing to watch out for is that InnoDB does not support FullText indices, nor INSERT DELAYED. But then, MyISAM doesn't support referential integrity. :-)
However, you can move over only the tables you need transaction aware. I've done this. Small tables (up to several thousand rows) can often be changed on-the-fly, incidentally.
The performance characteristics can be different, so you may need to keep an eye on the load.
The data will be fine.