I like InnoDB's safety, consistency, and self-checking.
But I need MyISAM's speed and light weight.
How can I make MyISAM less prone to corruption due to crashes, bad data, etc.? It takes forever to go through a check (either CHECK TABLE or myisamchk).
I'm not asking for transactional security -- that's what InnoDB is for. But I do want a database I can restart quickly rather than hours (or days!) later.
UPDATE: I'm not asking how to load data into tables faster. I've beat my head against that already, and determined that using the MyISAM tables for my LOAD DATA is simply much faster. What I'm after now is mitigating the risks of using MyISAM tables. That is, reducing chances of damage, increasing speed of recovery.
MyISAM's supposed speed benefits can actually go away pretty quickly - the fact that it lacks row-level locking means small updates can cause large amounts of data to be locked, and queries to block. Because of that, I'm skeptical of claimed MyISAM speed benefits: start doing several UPDATEs, and the queries per second will tank.
I think you're better off asking "How can applications backed with InnoDB be made faster?" and the answer then deals with caching data, perhaps at the object level, in lightweight caches - there is a cost for ACID, and for, say, web applications, it's not really needed.
If UPDATEs are rare (if they aren't, MyISAM isn't a good choice) then you can even use the MySQL query cache.
memcached (http://www.danga.com/memcached/) is a very popular option for object caching. Depending on your application you have other options as well (HTTP caches, etc.)
The performance advantages of MyISAM are actually pretty minimal in some cases; you need to benchmark your own application MyISAM vs InnoDB. Using the InnoDB transactional engine exclusively gives other benefits too.
In my testing InnoDB will use up typically about 150% more disc space than MyISAM- this is because of its block structure and lack of index compression.
If you can afford it, just use InnoDB instead.
As far as answering your actual question goes: If you partition your table into multiple MyISAM tables, the amount of repair needed in a crash will be much less; if your data are large, this might be a good idea anyway for other reasons.
in normal practice, you shouldn't get corruption. if you are getting corruption, you need to look at things like bad memory, bad hard drive, bad drive controller, or possibly a mysql bug.
if you want to side-step all that, you could set up a replication slave. when the master dies, stop the replication on the slave and make it your new master. the clear the data off your old master and set it up as a slave. user down-time will be limited to the amount of time it takes to detect that the master died and bring the slave up.
this has the added benefit of being a good way to achieve a zero-downtime backup: shut down the slave process and back up the slave.
While I agree with the innodb comments, I will give a solution to your MyISAM problem.
A good way to prevent corruption and increasing speed would be to use MERGE tables
You can use 2 or more MyISAM files. One is usually for backup'd old data that isn't used that often and the other is newer data. Then you will have 2 FRM (the MyISAM table files) on your harddisk and one will be protected. Usually you compress the old MyISAM tables and then they will defiantly not be corrupted, since they become read-only.
This technique is usually used to speed up big MyISAM tables, but you can apply it here as well.
Hope that helped your question. While I realize it didn't really help crash-proof MyISAM, it does give quite a bit of protection.
Are you married to MySQL? Postgres is ACID-compliant (like innoDB) and (when well-tuned) nearly as speedy as MyISAM.
Your comment:
No, the major problem is the amazingly
disk-intensive initial import of data
into the table. MyISAM time: 12
minutes. InnoDB time: 3+ hrs. After my
initial load, UPDATEs are non-existent
and INSERTs are rare. No known
solution to InnoDB's disappointing
load operation.
suggests dropping constraints and indexes, then enabling / rebuilding them after the load may significantly speed it up- I assume you tried that? Did that improve things?
This really depends a lot on how your use of the tables. If they are write heavy, then you may want to consider removing indexes, which will speed up the recovery time. If they are read heavy, you may want to consider using replication which will serialise all writes to your tables, minimising the recovery time for your read copy after a crash.
Once thing you could do is write to an InnoDB copy of the table, and then replicate to a MyISAM copy. The performance benefits of MyISAM are mostly read-oriented anyway.
Using replication of course, you will have lag time between reads and writes
Get a good UPS, with decent power conditioning. Run on stable and redundant hardware.
I don't trust MyISAM tables to ever survive a crash during a write, so I think your best bet is on reducing the occurrence of crashes (and writes).
Related
Many sites and script still use MySQL instead of PostgreSQL. I have a couple low-priority blogs and such that I don't want to migrate to another database so I'm using MySQL.
Here's the problem, their on a low-memory VPS. This means I can't enable InnoDB since it uses about 80MB of memory just to be loaded. So I have to risk running MyISAM.
With that in mind, what kind of data loss am I looking at with MyISAM? If there was a power-outage as someone was saving a blog post, would I just lose that post, or the whole database?
On these low-end-boxes I'm fine with losing some recent comments or a blog post as long as the whole database isn't lost.
MyISAM isn't ACID compliant and therefore lacks durability. It really depends on what costs more...memory to utilise InnoDB or downtime. MyISAM is certainly a viable option but what does your application require from the database layer? Using MyISAM can make life harder due to it's limitations but in certain scenarios MyISAM can be fine. Using only logical mysqldump backups will interrupt your service due to their locking nature. If you're utilising binary logging you can back these up to give you incremental backups that could be replayed to aid recovery should something corrupt in the MyISAM tables.
You might find the following MySQL Performance article of interest:
For me it is not only about table locks. Table locks is only one of MyISAM limitations you need to consider using it in production. Especially if you’re comming from “traditional” databases you’re likely to be shocked by MyISAM behavior (and default MySQL behavior due to this) – it will be corrupted by unproper shutdown, it will fail with partial statement execution if certain errors are discovered etc...
http://www.mysqlperformanceblog.com/2006/06/17/using-myisam-in-production/
The MySQL manual points out the types of events that can corrupt your table and there is an article explaining how to use myisamchk to repair tables. You can even issue a query to fix it.
REPAIR TABLE table;
However, there is no information about whether some types of crashes might be "unfix-able". That is the type of data loss that I can't allow even if I'm doing backups.
With a server crash your auto increment primary key can get corrupted, so your blog post IDs can jump from 122, 123, 75912371234, 75912371235 (where the server crashed after 123). I've seen it happen and it's not pretty.
You could always get another host on the same VLAN that is slaved to your database as a backup, this would reduce the risk considerably. I believe the only other options you have are:
Get more RAM for your server or kill of some services
See if your host has shared database hosting of any kind on the VLAN you can use for a small fee.
Make regular backups and be prepared for the worst.
In my humble opinion, there is no kind of data loss with MyISAM.
The risk of data loss from a power outage is due to the power outage, not the database storage mechanism.
Does having separate file for each table improve InnoDB performance in MySQL. Are there any such performance tuning tips for MySQL
"Using multiple tablespaces can be beneficial to users who want to move specific tables to separate physical disks or who wish to restore backups of single tables quickly without interrupting the use of other InnoDB tables."
So any speed bump may be dependent upon the architecture of your system and how good it is at parallel reads and writes. If you keep everything on the same disk then I wouldn't have thought that it makes much difference.
Have a look at these articles:
http://mtocker.livejournal.com/42180.html
http://www.bigdbahead.com/?p=57
It turns out that there is a small performance penalty for having data in many files, unless you are running some exotic file system which has high overhead on working with large files. I have experienced something similar with DRBD, however I haven't done enough investigation to claim this officially.
I personally go for separate files whenever I can, especially for bigger schemas.
I currently have an application that is using 130 MySQL table all with MyISAM storage engine. Every table has multiple queries every second including select/insert/update/delete queries so the data and the indexes are constantly changing.
The problem I am facing is that the hard drive is unable to cope, with waiting times up to 6+ seconds for I/O access with so many read/writes being done by MySQL.
I was thinking of changing to just 1 table and making it memory based. I've never used a memory table for something with so many queries though, so I am wondering if anyone can give me any feedback on whether it would be the right thing to do?
One possibility is that there may be other issues causing performance problems - 6 seconds seems excessive for CRUD operations, even on a complex database. Bear in mind that (back in the day) ArsDigita could handle 30 hits per second on a two-way Sun Ultra 2 (IIRC) with fairly modest disk configuration. A modern low-mid range server with a sensible disk layout and appropriate tuning should be able to cope with quite a substantial workload.
Are you missing an index? - check the query plans of the slow queries for table scans where they shouldn't be.
What is the disk layout on the server? - do you need to upgrade your hardware or fix some disk configuration issues (e.g. not enough disks, logs on the same volume as data).
As the other poster suggests, you might want to use InnoDB on the heavily written tables.
Check the setup for memory usage on the database server. You may want to configure more cache.
Edit: Database logs should live on quiet disks of their own. They use a sequential access pattern with many small sequential writes. Where they share disks with a random access work load like data files the random disk access creates a big system performance bottleneck on the logs. Note that this is write traffic that needs to be completed (i.e. written to physical disk), so caching does not help with this.
I've now changed to a MEMORY table and everything is much better. In fact I now have extra spare resources on the server allowing for further expansion of operations.
Is there a specific reason you aren't using innodb? It may yield better performance due to caching and a different concurrency model. It likely will require more tuning, but may yield much better results.
should-you-move-from-myisam-to-innodb
I think that that your database structure is very wrong and needs to be optimised, has nothing to do with the storage
What are the best practices for optimizing a MySQL installation for best performance when handling somewhat larger tables (> 50k records with a total of around 100MB per table)? We are currently looking into rewriting DelphiFeeds.com (a news site for the Delphi programming community) and noticed that simple Update statements can take up to 50ms. This seems like a lot. Are there any recommended configuration settings that we should enable/set that are typically disabled on a standard MySQL installation (e.g. to take advantage of more RAM to cache queries and data and so on)?
Also, what performance implications does the choice of storage engines have? We are planning to go with InnoDB, but if MyISAM is recommended for performance reasons, we might use MyISAM.
The "best practice" is:
Measure performance, isolating the relevant subsystem as well as you can.
Identify the root cause of the bottleneck. Are you I/O bound? CPU bound? Memory bound? Waiting on locks?
Make changes to alleviate the root cause you discovered.
Measure again, to demonstrate that you fixed the bottleneck and by how much.
Go to step 2 and repeat as necessary until the system works fast enough.
Subscribe to the RSS feed at http://www.mysqlperformanceblog.com and read its historical articles too. That's a hugely useful resource for performance-related wisdom. For example, you asked about InnoDB vs. MyISAM. Their conclusion: InnoDB has ~30% higher performance than MyISAM on average. Though there are also a few usage scenarios where MyISAM out-performs InnoDB.
InnoDB vs. MyISAM vs. Falcon benchmarks - part 1
The authors of that blog are also co-authors of "High Performance MySQL," the book mentioned by #Andrew Barnett.
Re comment from #ʞɔıu: How to tell whether you're I/O bound versus CPU bound versus memory bound is platform-dependent. The operating system may offer tools such as ps, iostat, vmstat, or top. Or you may have to get a third-party tool if your OS doesn't provide one.
Basically, whichever resource is pegged at 100% utilization/saturation is likely to be your bottleneck. If your CPU load is low but your I/O load is at its maximum for your hardware, then you are I/O bound.
That's just one data point, however. The remedy may also depend on other factors. For instance, a complex SQL query may be doing a filesort, and this keeps I/O busy. Should you throw more/faster hardware at it, or should you redesign the query to avoid the filesort?
There are too many factors to summarize in a StackOverflow post, and the fact that many books exist on the subject supports this. Keeping databases operating efficiently and making best use of the resources is a full-time job requiring specialized skills and constant study.
Jeff Atwood just wrote a nice blog article about finding bottlenecks in a system:
The Computer Performance Shell Game
Go buy "High Performance MySQL" from O'Reilly. It's almost 700 pages on the topic, so I doubt you'll find a succinct answer on SO.
It's hard to broadbrush things, but a moderately high-level view is possible.
You need to evaluate read:write ratios. For tables with ratios lower than about 5:1, you will probably benefit from InnoDB because then inserts won't block selects. But if you aren't using transactions, you should change innodb_flush_log_at_trx_commit to 1 to get performance back over MyISAM.
Look at the memory parameters. MySQL's defaults are very conservative and some of the memory limits can be raised by a factor of 10 or more on even ordinary hardware. This will benefit your SELECTs rather than INSERTs.
MySQL can log things like queries that aren't using indices, as well as queries that just take too long (user-defineable).
The query cache can be useful, but you need to instrument it (i.e. see how much it is used). Cacti can do that; as can Munin.
Application design is also important:
Lightly caching frequently fetched but smallish datasets will have a big difference (i.e. cache lifetime of a few seconds).
Don't re-fetch data that you already have to hand.
Multi-step storage can help with a high volume of inserts into tables that are also busily read. The basic idea is that you can have a table for ad-hoc inserts (INSERT DELAYED can also be useful), but a batch process to move the updates within MySQL from there to where all the reads are happening. There are variations of this.
Don't forget that perspective and context are important, too: what you might think is a long time for an UPDATE to happen might actually be quite trivial if that "long" update only happens once a day.
There are tons of best practices which have been previously discussed so there is no reason to repeat them. For actually concrete advice on what to do, I would try running MySQL Tuner. Its a perl script that you can download and then run on your database server, it will give you a bunch of statistics on how your database is performing (e.g. cache hits) along with some concrete recommendations for what issues or config parameters need to be adjusted to improve performance.
While these statistics are all available in MySQL itself, I find that this tool provides them in a much easier to understand fashion. While it is important to note that YMMV with respect to the recommendations, I have found them to generally be pretty accurate. Just make sure that you have done a good job exercising the database beforehand with realistic traffic.
I am currently working for a company that has a website running mysql/php (all tables are also using the MYISAM table type).
We would like to implement replication, but I have read in the mysql docs and elsewhere on the internet that this will lock the tables when doing the writes to the binary log (which the slave dbs will eventually read from).
Will these locks cause a problem on a live site that is fairly write-heavy? Also, is there a way to enable replication without having to lock the tables?
If you change your table types to innodb, row level locking is used. Also, your replication will be more stable, as updates will be transactional. MyISAM replication is a long-term pain.
Be sure that your servers are version-matched, and ALWAYS be sure to shut down the master before shutting down the slaves. You can bring the master up again immediately after shutting down the slaves, but you do have to take it down.
Also, make sure you use appropriate autoextend options for InnoDB. And, while you're at it, you'll probably want to migrate away from float and double to 'decimal' (which means mysql 5.1.) That will save you some replication headaches.
That's probably a bit more than you asked for. Enjoy.
P.s., yes the myisam locks can cause problems. Also, innodb is slower than myisam, unless myisam is blocking for a huge select.
In my experience DBAing a write-heavy site, writing a binary log adds no perceivable problems with locking or performance on the master. If you want to benchmark it, simply turn binary logging on. I really don't think tables are locked to write queries to the binary log.
Table locking on the slave is quite another thing, however. Replication is serial: each query runs to completion before the slave runs the next one. So long updates will cause replication to fall behind temporarily. If your application is intending to use replication for scale-out, it needs to know how to accomodate this.
The solution with the myisam table type is not 'better'. However, you can get by with it.
The best you can do, is make sure your slave and master run on the same hardware (FPU differences can create replication errors), as well as making sure you are running the same version numbers on your MySQL servers.
The following link answers your questions. Specifically, locks in MyISAM tables have less of a chance of blocking writes if there are no deletes going on. So a table that doesn't have delete holes in it will perform faster in a replicated setup.
http://dev.mysql.com/doc/refman/5.1/en/internal-locking.html
You can mitigate the effect of 'holes' by have a DBA export/import periodically during scheduled downtimes (especially after mass deletes.) Also, make sure your slave databases don't go down with the master still running. That will save you many, many issues.