I am currently working for a company that has a website running mysql/php (all tables are also using the MYISAM table type).
We would like to implement replication, but I have read in the mysql docs and elsewhere on the internet that this will lock the tables when doing the writes to the binary log (which the slave dbs will eventually read from).
Will these locks cause a problem on a live site that is fairly write-heavy? Also, is there a way to enable replication without having to lock the tables?
If you change your table types to innodb, row level locking is used. Also, your replication will be more stable, as updates will be transactional. MyISAM replication is a long-term pain.
Be sure that your servers are version-matched, and ALWAYS be sure to shut down the master before shutting down the slaves. You can bring the master up again immediately after shutting down the slaves, but you do have to take it down.
Also, make sure you use appropriate autoextend options for InnoDB. And, while you're at it, you'll probably want to migrate away from float and double to 'decimal' (which means mysql 5.1.) That will save you some replication headaches.
That's probably a bit more than you asked for. Enjoy.
P.s., yes the myisam locks can cause problems. Also, innodb is slower than myisam, unless myisam is blocking for a huge select.
In my experience DBAing a write-heavy site, writing a binary log adds no perceivable problems with locking or performance on the master. If you want to benchmark it, simply turn binary logging on. I really don't think tables are locked to write queries to the binary log.
Table locking on the slave is quite another thing, however. Replication is serial: each query runs to completion before the slave runs the next one. So long updates will cause replication to fall behind temporarily. If your application is intending to use replication for scale-out, it needs to know how to accomodate this.
The solution with the myisam table type is not 'better'. However, you can get by with it.
The best you can do, is make sure your slave and master run on the same hardware (FPU differences can create replication errors), as well as making sure you are running the same version numbers on your MySQL servers.
The following link answers your questions. Specifically, locks in MyISAM tables have less of a chance of blocking writes if there are no deletes going on. So a table that doesn't have delete holes in it will perform faster in a replicated setup.
http://dev.mysql.com/doc/refman/5.1/en/internal-locking.html
You can mitigate the effect of 'holes' by have a DBA export/import periodically during scheduled downtimes (especially after mass deletes.) Also, make sure your slave databases don't go down with the master still running. That will save you many, many issues.
Related
I understand that mysql replication uses two different threads on slave -
Slave I/O thread to read binlog from master and write it to local relay log
Slave SQL thread to execute the sql statements from the local relay log. This thread is for executing update, delete and create sqls.
What about select queries on slave? Can SELECT queries interfere with the replication process? Or is there a different thread that execute SELECT queries?
I mean, can slow select queries on slave make the replication to lag behind master ?
In short queries can interfere with the replication it is not the threading that matters here but the locking being applied (ACID vs. threading). A update query from the master that is being replicated to the slave can be blocked by a select query on the slave. However the replication sub-system will deal with these query locking issues most of the time. If you don’t mind dirty reads you can set the transaction serialization isolation level on the slave to something less restrictive to mitigate the risk. However make sure dirty reads is acceptable see this link for more information: http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html
You are concerned about lag and that is not something you can eliminate in any replication schema there will be lag. There is almost always a network in between the master and slave. This will introduce lag right from the start. For example large replicated statements could saturate the network bandwidth and this is probably going to happen more often than a query blocking the replication. Replication never was and never will be instantaneous. So your point about lag can be answered like this lag is something you HAVE to deal with not something you can completely eliminate.
Dont get me wrong replication can be fast but it is NEVER instantaneous.
Another thing to keep in mind is that replication can fail and you need to plan for that as well. It will happen at some point and being prepared for it is essential. So basically you will have lag no matter how you do replication and you need to be able to deal with it. Also be prepared for replication failing at some point and how you will recover from it.
While replication can be useful in many places you need to make sure you are preparing for it on many levels such as adequate network infrastructure, how to deal with it during disaster recovery(failover), monitoring it during production and how to get it back online when it breaks.
We have to show a difference to show the advantages of using replication. We have two computers, linked by teamviewer so we can show our class what we are doing exactly.
Is it possible to show a difference in performance? (How long it takes to execute certain queries?)
What sort queries should we test? (in other words, where is the difference between using/not using replication the biggest)
How should we fill our database? How much data should be there?
Thanks a lot!
I guess the answer to the above questions depends on factors such as which storage engine you are using, size of the database, as well as your chosen replication architecture.
I don't think replication will have much of an impact on query execution for simple master->slave architecture. If however, you have an architecture where there are two masters: one handling writes, replicating to another master which exclusively handles reads, and then replication to a slave which handles backups, then you are far more likely to be able to present some of the more positive scenarios. Have a read up on locks and storage engines, as this might influence your choices.
One simple way to show how Replication can be positive is to demonstrate a simple backup strategy. E.g. Taking hourly backups on a master server itself can bring the underlying application to a complete halt for the duration of the backup (Taking backups using mysqldump locks the tables so that no read/write operations can occur). Whereas replicating to a slave, then taking backups from there negates this affect.
If you want to show detailed statistics, it's probably better to look into some benchmarking/profiling tools (sysbench,mysqlslap,sql-bench to name a few). This can become quite complex though.
Also might be worth looking at the Percona Toolkit and the Percona monitoring plugins here: http://www.percona.com/software/
Replication has several advantages:
Robustness is increased with a master/slave setup. In the event of problems with the master, you can switch to the slave as a backup
Better response time for clients can be achieved by splitting the load for processing client queries between the master and slave servers
Another benefit of using replication is that you can perform database backups using a slave server without disturbing the master.
Using replication always a safe thing to do you should be replicating your Production server always incase of failure it will be helpful.
You can show seconds_behind_master value while showing replication performance, this shows indication of how “late” the slave is this value should not be more than 600-800 seconds but network latency does matter here.
Make sure that Master and Slave servers are configured correctly now
You can stop slave server and let Master server has some updates/inserts (bulk inserts) happening and now start slave server you will see larger seconds_behind_master value it should be keep on decreasing till reaches 0 value.
There is a tool called MONyog - MySQL Monitor and Advisor which shows Replication status in real-time.
Also what kind of replication to use whether statement based or row based has been explained here
http://dev.mysql.com/doc/refman/5.1/en/replication-sbr-rbr.html
I currently have a MySQL dual master replication (A<->B) set up and everything seems to be running swimmingly. I drew on the basic ideas from here and here.
Server A is my web server (a VPS). User interaction with the application leads to updates to several fields in table X (which are replicated to server B). Server B is the heavy-lifter, where all the big calculations are done. A cron job on server B regularly adds rows to table X (which are replicated to server A).
So server A can update (but never add) rows, and server B can add rows. Server B can also update fields in X, but only after the user no longer has the ability to update that row.
What kinds of potential disasters can I expect with this scenario if I go to production with it? Or does this seem OK? I'm asking mostly because I'm ignorant about whether any simultaneous operation on the table (from either the A copy or the B copy) can cause problems or if it's just operations on the same row that get hairy.
Dual master replication is messy if you attempt to write to the same database on both masters.
One of the biggest points of contention (and high blood pressure) is the use of autoincrement keys.
As long as you remember to set auto_increment_increment and auto_increment_offset, you can lookup any data you want and retrieve auto_incremented ids.
You just have to remember this rule: If you read an id from serverX, you must lookup needed data from serverX using the same id.
Here is one saving grace for using dual master replication.
Suppose you have
two databases (db1 and db2)
two DB servers (serverA and serverB)
If you impose the following restrictions
all writes of db1 to serverA
all writes of db2 to serverB
then you are not required to set auto_increment_increment and auto_increment_offset.
I hope my answer clarifies the good, the bad, and the ugly of using dual master replication.
Here is a pictorial example of 4 masters using auto increment settings
Nice article from Percona on this subject
Master-master replication can be very tricky, are you sure that this is the best solution for you ? Usually it is used for load-balancing purposes (e.g. round-robin connect to your db servers) and sometimes when you want to avoid the replication lag effect. A big known issue is the auto_increment problem which is supposedly solved using different offsets and increment value.
I think you should modify your configuration to simple master-slave by making A the master and B the slave, unless I am mistaken about the requirements of your system.
I think you can depend on
Percona XtraDB Cluster Feature 2: Multi-Master replication than regular MySQL replication
They promise the foll:
By Multi-Master I mean the ability to write to any node in your cluster and do not worry that eventually you get out-of-sync situation, as it regularly happens with regular MySQL replication if you imprudently write to the wrong server.
With Cluster you can write to any node, and the Cluster guarantees consistency of writes. That is the write is either committed on all nodes or not committed at all.
The two important consequences of Muti-master architecture.
First: we can have several appliers working in parallel. This gives us true parallel replication. Slave can have many parallel threads, and you can tune it by variable wsrep_slave_threads
Second: There might be a small period of time when the slave is out-of-sync from master. This happens because the master may apply event faster than a slave. And if you do read from the slave, you may read data, that has not changes yet. You can see that from diagram. However you can change this behavior by using variable wsrep_causal_reads=ON. In this case the read on the slave will wait until event is applied (this however will increase the response time of the read. This gap between slave and master is the reason why this replication named “virtually synchronous replication”, not real “synchronous replication”
The described behavior of COMMIT also has the second serious implication.
If you run write transactions to two different nodes, the cluster will use an optimistic locking model.
That means a transaction will not check on possible locking conflicts during individual queries, but rather on the COMMIT stage. And you may get ERROR response on COMMIT. I am highlighting this, as this is one of incompatibilities with regular InnoDB, that you may experience. In InnoDB usually DEADLOCK and LOCK TIMEOUT errors happen in response on particular query, but not on COMMIT. Well, if you follow a good practice, you still check errors code after “COMMIT” query, but I saw many applications that do not do that.
So, if you plan to use Multi-Master capabilities of XtraDB Cluster, and run write transactions on several nodes, you may need to make sure you handle response on “COMMIT” query.
You can find it here along with pictorial expln
From my rather extensive experience on this topic I can say you will regret writing to more than one master someday. It may be soon, it may not be for a long time, but it will happen. You will have two servers that each have some correct data and some wrong data, and you will either pick one as the authoritative source and throw the other away (probably without really knowing what you're throwing away) or you'll reconcile the two. No matter how you design it, you cannot eliminate the possibility of this happening, so it's a mathematical certainty that it will happen someday.
Percona (my employer) has handled probably several hundred cases of recovery after doing what you're attempting. Some of them take hours, some take weeks, one I helped with took a few months -- and that's with excellent tools to help.
Use a different replication technology or find a different way to do what you want to do. MMM will not help -- it will bring catastrophe sooner. You cannot do this with standard MySQL replication, with or without external tools. You need a replacement replication technology such as Continuent Tungsten or Percona XtraDB Cluster.
It's often easier to just solve the real need in some other fashion and give up multi-master writes, if you want to use vanilla MySQL replication.
and thanks for sharing my Master-Master Mysql cluster article. As Rolando clarified this configuration is not suitable for most production environment due to the limitation of autoincrement support.
The most adequate way to get a MySQL cluster is using NDB, which require at least 4 servers (2 management and 2 data nodes).
I have written a detailed article to get this running on two servers only, which is very similar to my previous article but using NDB instead.
http://www.hbyconsultancy.com/blog/mysql-cluster-ndb-up-and-running-7-4-and-6-3-on-ubuntu-server-trusty-14-04.html
Notice that I always recommend to analyse your needs and find out the most adequate solution, don't just look for available solutions and try to figure out if they fit with your needs or not.
-Hatem
I would highly recommend looking into a tool that will manage this for you. Multi-master replication can be very troublesome if things go wrong.
I would suggest something like Percona XtraDB Cluster. I've been following this project, and it looks very cool. I definitely think it will be a game changer in the MySQL world. It's still in beta though.
Many sites and script still use MySQL instead of PostgreSQL. I have a couple low-priority blogs and such that I don't want to migrate to another database so I'm using MySQL.
Here's the problem, their on a low-memory VPS. This means I can't enable InnoDB since it uses about 80MB of memory just to be loaded. So I have to risk running MyISAM.
With that in mind, what kind of data loss am I looking at with MyISAM? If there was a power-outage as someone was saving a blog post, would I just lose that post, or the whole database?
On these low-end-boxes I'm fine with losing some recent comments or a blog post as long as the whole database isn't lost.
MyISAM isn't ACID compliant and therefore lacks durability. It really depends on what costs more...memory to utilise InnoDB or downtime. MyISAM is certainly a viable option but what does your application require from the database layer? Using MyISAM can make life harder due to it's limitations but in certain scenarios MyISAM can be fine. Using only logical mysqldump backups will interrupt your service due to their locking nature. If you're utilising binary logging you can back these up to give you incremental backups that could be replayed to aid recovery should something corrupt in the MyISAM tables.
You might find the following MySQL Performance article of interest:
For me it is not only about table locks. Table locks is only one of MyISAM limitations you need to consider using it in production. Especially if you’re comming from “traditional” databases you’re likely to be shocked by MyISAM behavior (and default MySQL behavior due to this) – it will be corrupted by unproper shutdown, it will fail with partial statement execution if certain errors are discovered etc...
http://www.mysqlperformanceblog.com/2006/06/17/using-myisam-in-production/
The MySQL manual points out the types of events that can corrupt your table and there is an article explaining how to use myisamchk to repair tables. You can even issue a query to fix it.
REPAIR TABLE table;
However, there is no information about whether some types of crashes might be "unfix-able". That is the type of data loss that I can't allow even if I'm doing backups.
With a server crash your auto increment primary key can get corrupted, so your blog post IDs can jump from 122, 123, 75912371234, 75912371235 (where the server crashed after 123). I've seen it happen and it's not pretty.
You could always get another host on the same VLAN that is slaved to your database as a backup, this would reduce the risk considerably. I believe the only other options you have are:
Get more RAM for your server or kill of some services
See if your host has shared database hosting of any kind on the VLAN you can use for a small fee.
Make regular backups and be prepared for the worst.
In my humble opinion, there is no kind of data loss with MyISAM.
The risk of data loss from a power outage is due to the power outage, not the database storage mechanism.
I like InnoDB's safety, consistency, and self-checking.
But I need MyISAM's speed and light weight.
How can I make MyISAM less prone to corruption due to crashes, bad data, etc.? It takes forever to go through a check (either CHECK TABLE or myisamchk).
I'm not asking for transactional security -- that's what InnoDB is for. But I do want a database I can restart quickly rather than hours (or days!) later.
UPDATE: I'm not asking how to load data into tables faster. I've beat my head against that already, and determined that using the MyISAM tables for my LOAD DATA is simply much faster. What I'm after now is mitigating the risks of using MyISAM tables. That is, reducing chances of damage, increasing speed of recovery.
MyISAM's supposed speed benefits can actually go away pretty quickly - the fact that it lacks row-level locking means small updates can cause large amounts of data to be locked, and queries to block. Because of that, I'm skeptical of claimed MyISAM speed benefits: start doing several UPDATEs, and the queries per second will tank.
I think you're better off asking "How can applications backed with InnoDB be made faster?" and the answer then deals with caching data, perhaps at the object level, in lightweight caches - there is a cost for ACID, and for, say, web applications, it's not really needed.
If UPDATEs are rare (if they aren't, MyISAM isn't a good choice) then you can even use the MySQL query cache.
memcached (http://www.danga.com/memcached/) is a very popular option for object caching. Depending on your application you have other options as well (HTTP caches, etc.)
The performance advantages of MyISAM are actually pretty minimal in some cases; you need to benchmark your own application MyISAM vs InnoDB. Using the InnoDB transactional engine exclusively gives other benefits too.
In my testing InnoDB will use up typically about 150% more disc space than MyISAM- this is because of its block structure and lack of index compression.
If you can afford it, just use InnoDB instead.
As far as answering your actual question goes: If you partition your table into multiple MyISAM tables, the amount of repair needed in a crash will be much less; if your data are large, this might be a good idea anyway for other reasons.
in normal practice, you shouldn't get corruption. if you are getting corruption, you need to look at things like bad memory, bad hard drive, bad drive controller, or possibly a mysql bug.
if you want to side-step all that, you could set up a replication slave. when the master dies, stop the replication on the slave and make it your new master. the clear the data off your old master and set it up as a slave. user down-time will be limited to the amount of time it takes to detect that the master died and bring the slave up.
this has the added benefit of being a good way to achieve a zero-downtime backup: shut down the slave process and back up the slave.
While I agree with the innodb comments, I will give a solution to your MyISAM problem.
A good way to prevent corruption and increasing speed would be to use MERGE tables
You can use 2 or more MyISAM files. One is usually for backup'd old data that isn't used that often and the other is newer data. Then you will have 2 FRM (the MyISAM table files) on your harddisk and one will be protected. Usually you compress the old MyISAM tables and then they will defiantly not be corrupted, since they become read-only.
This technique is usually used to speed up big MyISAM tables, but you can apply it here as well.
Hope that helped your question. While I realize it didn't really help crash-proof MyISAM, it does give quite a bit of protection.
Are you married to MySQL? Postgres is ACID-compliant (like innoDB) and (when well-tuned) nearly as speedy as MyISAM.
Your comment:
No, the major problem is the amazingly
disk-intensive initial import of data
into the table. MyISAM time: 12
minutes. InnoDB time: 3+ hrs. After my
initial load, UPDATEs are non-existent
and INSERTs are rare. No known
solution to InnoDB's disappointing
load operation.
suggests dropping constraints and indexes, then enabling / rebuilding them after the load may significantly speed it up- I assume you tried that? Did that improve things?
This really depends a lot on how your use of the tables. If they are write heavy, then you may want to consider removing indexes, which will speed up the recovery time. If they are read heavy, you may want to consider using replication which will serialise all writes to your tables, minimising the recovery time for your read copy after a crash.
Once thing you could do is write to an InnoDB copy of the table, and then replicate to a MyISAM copy. The performance benefits of MyISAM are mostly read-oriented anyway.
Using replication of course, you will have lag time between reads and writes
Get a good UPS, with decent power conditioning. Run on stable and redundant hardware.
I don't trust MyISAM tables to ever survive a crash during a write, so I think your best bet is on reducing the occurrence of crashes (and writes).