Magento Database Scaling? - mysql

I've been doing some work on Wordpress and just stumbled across database scaling.
I was wondering if it was at all possible to implement this with Magento as we have an increasingly large database and thousands of products still to add (with various options for configurable products!!)

You can setup master slave replication for mysql so that you have a customer facing server on the master database and a admin only server on the slave database. You can edit /app/etc/local.xml on the slave so that it reads from the slave but writes to the master. In that way you can upload loads of products, do loads of reports and the live system only has to slow down for the occasional write to the db. You can also have your slave system in your office on the end of an ADSL line.
For this to work standard mysql master slave settings work fine, it also does rather well in catching up with itself if the slave needs rebooting. The only gotcha of note is that mysql log files get big and your mysql my.cnf needs to have the setting on to delete the log files after n days.

Magento has recently certified ClustrixDB to be the scale-out database solution: http://www.marketwired.com/press-release/clustrix-selected-as-magento-technology-partner-2010010.htm
Disclaimer: I work at Clustrix and I know this is just a press release, but I have been also involved in the testing and it really works!

Disclaimer - I work at Clustrix.
Magento solution architect and certified developer, Kevin Bortnick, recently gave a webinar entiteld "Scaling Techniques to Increase Magento Capacity" at Meet Magento NY. Kevin currently works for clustrix but prior to joining this year, he worked on large Magento sites such as The Home Depot and Dr. Jays. In the webinar he shares his practical experiences. He covers 3 front end strategies and 8 RDBMS scaling strategies including:
Percona
Faster hardware
Master with read slaves
Master/Master, Store/Admin
True Multi-master
Partitioning (Magento 2 feature CQRS)
NoSQL
ClustrixDB
I hope this webinar helps you find the right solution for your use case.

Related

MySQL Load Balancing

We have 6 Servers (4 Applications servers and 2 DB Servers)
We are using HAProxy to load balance between the Application and API servers (2/2)
Now the issue I'm having is that the system administrator setup a Master/Slave on the MySQL but it's always failing and until now we cannot use the slave since most data are always corrupted and we always need to fix it and each time we are getting different errors .
We tried to make some sort of load balancing for the read/write (write on master , read on slave) but we were not able to use that since slave data are not always correct .
What I'm wondering is how the big guys proceed when dealing with high load servers where you always need the data to be accurate and cannot take any risk?
Can someone tell me his own experience and what he used ?
What i found : Percona XtraDB Cluster , but before going into this direction need input ...
Thank's !!
You can choose MySQL/MariaDB+ Percona + HAproxy. This combination support Master- Master synchronization and Data sync work really well. The most of the Real-time Data synchronization has Issue with primary and foreign Key. You can avoid those issue too using Percona. Go ahead and Good Luck
The "table is full" error means your slave doesn't have enough space to perform the ALTER TABLE. You need to get larger disks to resolve that error.
But the subtext is that no one is monitoring your database servers, and that's a bigger problem. You need to get a database administrator, or else get a professional service to do it.
What I'm wondering is how the big guys proceed when dealing with high load servers where you always need the data to be accurate and cannot take any risk?
First, get it out of your head that any system has no risk. That's impossible, if you plan to use the system at all. You can't eliminate the possibility of errors, but you can be prepared to recover from them seamlessly.
The big guys do the following:
Hire operations staff including system administrators, network administrators, database administrators to take care of the servers.
Monitor everything. Use software to track system load, disk space, errors, and many other things continuously. The best option is New Relic. For MySQL slave integrity, use a tool like pt-table-checksum.
Redundancy. Create standby systems and data to take over when (not if) the primary system fails.
You probably want to learn about the field of high availability architecture. Check out this talk: Scalable Internet Architectures
Get on amazon ec2. You can launch 4 app server along with 2 db servers on the fly and set up load balancing using aws engineering features.
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-getting-started.html#define-load-balancer
https://aws.amazon.com/articles/1639

Should I be using MySQL Cluster, Master to Master replication, or something else?

I started just using a solo MySQL server and moved on to using Master to Slave replication. This is all inside our network. I have a cloud application where customers post orders to our system. When our ISP goes down it's a nightmare.
I'm looking to have one server on-site and one server off-site that can be in sync and if the one goes down the other one can take over and not miss a step. I have my DNS failover in place and 2 web servers, but I can't decide what I need to do for the MySQL servers.
I don't mind putting in the work to learn MySQL cluster, but I'm not sure if that is the correct solution or master to master or something else?
Scale: I have an orders table currently sitting at 150,000 rows and that could grow to 500,000 this year and possibly start getting into the millions over the next couple years.
Any advice would be greatly appreciated as I never had any formal schooling on the issue.
Thanks in advance.
Simpliest way - just place your server in some reliable datacenter. This way you can reduce failure rate to be tolerable to be handled semi-manually with a master-slave configuration.
If you need to host on-site - then look into improving your site's connectivity, like all datacenters do - have backup ISP channels, with own AS(ip autonomous system) and BGP routing - so when one ISP fails - even ips stay the same, traffic just balances to the others.
Mysql server does not support master-master replication, only mysql cluster supports multi-master, so in fact if you need fast failover - question is whether to host it yourself or use DBaaS (database-as-a-service, folks like cleardb).
DBaaS with SLA and failover is quite expensive, also adds some network delays because your own app and db servers groups most probably are in the same datacenters. But on the plus side - they are easier and faster to setup.

How can I sync a database driven website to a different server

I have a website using cPanel on a dedicated account, I would like to be able to automatically sync the website to a second hosting company or perhaps to a local (in house ) server.
Basically this is a type of replication. The website is database driven (MySQL), so Ideally it would sync everything (content, database, emails etc.) , but most importantly is syncing the website files and its database.
I'm not so much looking for a fail-over solution as an automatic replication solution, so if the primary site (Server) goes off-line, I can manually bring up the replicated site quickly.
I'm familiar with tools like unison and rsync, but most of these only sync file(s) and do not do too well with open database connections.
Don't use one tool when two is better; Use rsync for files, but use replication for MySQL.
If, for some reason, you don't want to use replication, you might want to consider using DRBD. This is of course only applicable if you're running Linux. DRBD is now part of the main kernel (since version 2.6.33).
And yes - I am aware of at least one large enterprise deployment of DRBD which is used, among other things, store MySQL database files. In fact, MySQL website even has relevant page on this topic.
You might also want to Google for articles against DRBD/MySQL combination; I remember reading few posts of that.

Which server can I decide for MySQL, windows or Unix/Linux/Ubuntu/Debian?

I'm working on a SaaS project and mysql is our main database. Our applications is written on c# .net and runs under an windows 2003 server.
Considering maintainance, cost, options and performance, which server plattaform can I decide for MySQL hosting, windows or Unix/Linux/Ubuntu/Debian?
The scenario is as following:
The server I run today has a modarate transaction volume. Databases increase 5MB daily and we expect to increase 50MB in couple of months and it is mission critical.
I don't know how big the database is going to be. We rent a VPS to host application and database server.
Most of our queries are simple but our ORM Tool makes constantly use of subqueries. Also we run reports simple and heavy ones. Some them runs after user click, but most runs in order to the queue.
Buy an extra co-lo space will be nice as we got more clients. That's SaaS project after all.
When developing, you can use your Windows box to also run a MySQL server. If and when you
want to have your DBMS in a separate server it can be in either a Windows or Linux server.
MySql and supporting tools for backup etc probably have more choices in Linux.
There are also 3rd party suppliers who will host your MySQL database on their servers. The benefit is they will handle backups, maintenance etc.
Also: look into phpMyAdmin for use as a great admin tool.
Larry
I think you need more information to make an informed decision. It's hard to just pull out a "best" answer based on no specific information.
What is your expected transaction volume?
How big will the database get?
How complex are your queries, ie are they long running or relatively quick?
Are you hosting the application on your own server at your own location? If you have to buy extra co-lo space maybe an extra server isn't the best option.
How "mission critical" is this database? Ie maybe you need replicated servers to ensure stability.
There is a server sizing tool online at http://www.sizinglounge.com/, so you should check that out. It sounds like your server could be smaller than their smallest tier, but it should be a good place to start.
If this is a mission critical application you need to do some kind of replication to an extra server in case the primary one fails, so you are definitely looking at two systems. This has to be in addition to a good backup plan.
Given that you are uncertain about how big it could get you might just continue renting a server. For your backup one idea would be to look at running MySQL on an Amazon EC2 instance. BTW it is important to have a remote replicated server. If you have two systems next to each other and an environmental problem comes up, they could both be out of commission at the same time. But with a remote copy your options are open to potentially working around it.
If you run a lot of read-only queries locally and have your site hosted somewhere, it might make sense to set up a local replicated database copy to query against. That could potentially improve both your website and local performance quite a bit. Plus it would give you some good piece of mind having a local copy under your control.
HTH,
Brandon

Which database has the best support for replication

I have a fairly good feel for what MySQL replication can do. I'm wondering what other databases support replication, and how they compare to MySQL and others?
Some questions I would have are:
Is replication built in, or an add-on/plugin?
How does the replication work (high-level)? MySQL provides statement-based replication (and row-based replication in 5.1). I'm interested in how other databases compare. What gets shipped over the wire? How do changes get applied to the replicas?
Is it easy to check consistency between master and slaves?
How easy is it to get a failed replica back in sync with the master?
Performance? One thing I hate about MySQL replication is that it's single-threaded, and replicas often have trouble keeping up, since the master can be running many updates in parallel, but the replicas have to run them serially. Are there any gotchas like this in other databases?
Any other interesting features...
MySQL's replication is weak inasmuch as one needs to sacrifice other functionality to get full master/master support (due to the restriction on supported backends).
PostgreSQL's replication is weak inasmuch as only master/standby is supported built-in (using log shipping); more powerful solutions (such as Slony or Londiste) require add-on functionality. Archive log segments are shipped over the wire, which are the same records used to make sure that a standalone database is in working, consistent state on unclean startup. This is what I'm using presently, and we have resynchronization (and setup, and other functionality) fully automated. None of these approaches are fully synchronous. More complete support will be built in as of PostgreSQL 8.5. Log shipping does not allow databases to come out of synchronization, so there is no need for processes to test the synchronized status; bringing the two databases back into sync involves setting the backup flag on the master, rsyncing to the slave (with the database still runnning; this is safe), and unsetting the backup flag (and restarting the slave process) with the archive logs generated during the backup process available; my shop has this process (like all other administration tasks) automated. Performance is a nonissue, since the master has to replay the log segments internally anyhow in addition to doing other work; thus, the slaves will always be under less load than the master.
Oracle's RAC (which isn't properly replication, as there's only one storage backend -- but you have multiple frontends sharing the load, and can build redundancy into that shared storage backend itself, so it's worthy of mention here) is a multi-master approach far more comprehensive than other solutions, but is extremely expensive. Database contents aren't "shipped over the wire"; instead, they're stored to the shared backend, which all the systems involved can access. Because there is only one backend, the systems cannot come out of sync.
Continuent offers a third-party solution which does fully synchronous statement-level replication with support for all three of the above databases; however, the commercially supported version of their product isn't particularly cheap (though vastly less expensive. Last time I administered it, Continuent's solution required manual intervention for bringing a cluster back into sync.
I have some experience with MS-SQL 2005 (publisher) and SQLEXPRESS (subscribers) with overseas merge replication. Here are my comments:
1 - Is replication built in, or an add-on/plugin?
Built in
2 - How does the replication work
(high-level)?
Different ways to replicate, from snapshot (giving static data at the subscriber level) to transactional replication (each INSERT/DELETE/UPDATE instruction is executed on all servers). Merge replication replicate only final changes (successives UPDATES on the same record will be made at once during replication).
3 - Is it easy to check consistency between master and slaves?
Something I have never done ...
4 - How easy is it to get a failed replica back in sync with the master?
The basic resynch process is just a double-click one .... But if you have 4Go of data to reinitialize over a 64 Kb connection, it will be a long process unless you customize it.
5 - Performance?
Well ... You will of course have a bottleneck somewhere, being your connection performance, volume of data, or finally your server performance. In my configuration, users only write to subscribers, which all replicate with the main database = publisher. This server is then never sollicited by final users, and its CPU is strictly dedicated to data replication (to multiple servers) and backup. Subscribers are dedicated to clients and one replication (to publisher), which gives a very interesting result in terms of data availability for final users. Replications between publisher and subscribers can be launched together.
6 - Any other interesting features...
It is possible, with some anticipation, to keep on developping the database without even stopping the replication process....tables (in an indirect way), fields and rules can be added and replicated to your subscribers.
Configurations with a main publisher and multiple suscribers can be VERY cheap (when compared to some others...), as you can use the free SQLEXPRESS on the suscriber's side, even when running merge or transactional replications
Try Sybase SQL Anywhere
Just adding to the options with SQL Server (especially SQL 2008, which has Change Tracking features now). Something to consider is the Sync Framework from Microsoft. There's a few options there, from the basic hub-and-spoke architecture which is great if you have a single central server and sometimes-connected clients, right through to peer-to-peer sync which gives you the ability to do much more advanced syncing with multiple 'master' databases.
The reason you might want to consider this instead of traditional replication is that you have a lot more control from code, for example you can get events during the sync progress for Update/Update, Update/Delete, Delete/Update, Insert/Insert conflicts and decide how to resolve them based on business logic, and if needed store the loser of the conflict's data somewhere for manual or automatic processing. Have a look at this guide to help you decide what's possible with the different methods of replication and/or sync.
For the keen programmers the Sync Framework is open enough that you can have the clients connect via WCF to your WCF Service which can abstract any back-end data store (I hear some people are experimenting using Oracle as the back-end).
My team has just gone release with a large project that involves multiple SQL Express databases syncing sub-sets of data from a central SQL Server database via WAN and Internet (slow dial-up connection in some cases) with great success.
MS SQL 2005 Standard Edition and above have excellent replication capabilities and tools. Take a look at:
http://msdn.microsoft.com/en-us/library/ms151198(SQL.90).aspx
It's pretty capable. You can even use SQL Server Express as a readonly subscriber.
There are a lot of different things which databases CALL replication. Not all of them actually involve replication, and those which do work in vastly different ways. Some databases support several different types.
MySQL supports asynchronous replication, which is very good for some things. However, there are weaknesses. Statement-based replication is not the same as what most (any?) other databases do, and doesn't always result in the expected behaviour. Row-based replication is only supported by a non production-ready version (but is more consistent with how other databases do it).
Each database has its own take on replication, some involve other tools plugging in.
A bit off-topic but you might want to check Maatkit for tools to help with MySQL replication.
All the main commercial databases have decent replication - but some are more decent than others. IBM Informix Dynamic Server (version 11 and later) is particularly good. It actually has two systems - one for high availability (HDR - high-availability data replication) and the other for distributing data (ER - enterprise replication). And the the Mach 11 features (RSS - remote standalone secondary, and SDS - shared disk secondary) are excellent too, doubly so in 11.50 where you can write to either the primary or secondary of an HDR pair.
(Full disclosure: I work on Informix softare.)
I haven't tried it myself, but you might also want to look into OpenBaseSQL, which seems to have some simple to use replication built-in.
Another way to go is to run in a virtualized environment. I thought the data in this blog article was interesting
http://chucksblog.typepad.com/chucks_blog/2008/09/enterprise-apps.html
It's from an EMC executive, so obviously, it's not independent, but the experiment should be reproducible
Here's the data specific for Oracle
http://oraclestorageguy.typepad.com/oraclestorageguy/2008/09/to-rac-or-not-to-rac-reprise.html
Edit: If you run virtualized, then there are ways to make anything replicate
http://chucksblog.typepad.com/chucks_blog/2008/05/vmwares-srm-cha.html