Locking DB w/ Large Reads (Ruby-on-Rails/Heroku) - mysql

Currently I have a Web API running on Heroku that is constantly writing information we're collecting from other data sources (currently theres about half a GB of data and it's growing very quickly). We're looking to add a reporting system on top of the current database that we can use to extract useful information out of the DB. The problem is that when we're running reports we're locking the DB and any other sites communicating with the DB are timing out. Does anyone have any solutions on how to solve this type of issue? Amazon RDS seems to have some interesting stuff with database replication but I don't know if that will solve my problems.
Any advice would be greatly appreciated.
Thanks

Be sure you are running innodb tables and not the old isam or myisam tables - innodb has row level locks which is much more scalable.
Make sure that you have indexes defined on all your joining/foreign keys... if you do joins without indexes it will grind. Also make sure you have indexes where appropriate for data that you search or sort on (as long as it is diverse data, not boolean or a small number of values)
Replication is another good idea, as you could target the reports at the secondary server in read-only mode, and it will just catch up once it unlocks. half a GB of data should not really be locking it up yet, so I'd look at the indexes and innodb first.

One solution to this is to have a replica of the database, so that your normal traffic goes to the master database, while long-running queries execute on the slave. I'm not sure how much control you get over the database on Heroku though, they may not support replication.
However, have you considered that the Heroku setup may be the problem here? A 500 MB database shouldn't really have performance issues unless you're performing really complex queries.
If you're happy using MySQL instead of Postgres, Engine Yard supports database replication (although generally it may not be as easy to use as Heroku).

Related

MySQL changing large table to InnoDB

I have a MySQL server running on CentOS which houses a large (>12GB) DB. I have been advised to move to InnoDB for performance reasons as we are experiencing lockups where the application that relies on the DB becomes unresponsive when the server is busy.
I have been reading around and can see that the ALTER command that changes the table to InnoDB is likely to take a long time and hammer the server in the process. As far as I can see, the only change required is to use the following command:
ALTER TABLE t ENGINE=InnoDB
I have run this on a test server and it seems to complete fine, taking about 26 minutes on the largest of the tables that needs to be converted.
Having never run this on a production system I am interested to know the following:
What changes are recommended to be made to the MySQL config to take advantage of additional performance of InnoDB tables? The server currently has 3GB assigned to InnoDB cache - was thinking of increasing this to 15GB once the additional RAM is installed.
Is there anything else I should do to the server with this change?
I would really recommend using either Percona MySQL or MariaDB. Both have tools that will help you get the most out of InnoDB, as well as some tools to help you diagnose and optimize your database further (for example, Percona's Online Schema Change tool could be used to alter your tables without downtime).
As far as optimization of InnoDB, I think most would agree that innodb_buffer_pool_size is one of the most important parameters to tune (and typically people set it around 70-80% of total available memory, but that's not a magic number). It's not the only important config variable, though, and there's really no magic run_really_fast setting. You should also pay attention to innodb_buffer_pool_instances (and there's a good discussion about this topic on https://dba.stackexchange.com/questions/194/how-do-you-tune-mysql-for-a-heavy-innodb-workload)
Also, you should definitely check out the tips offered in the MySQL documentation itself (http://dev.mysql.com/doc/refman/5.6/en/optimizing-innodb.html). It's also a good idea to pay attention to your InnoDB hit ratio (Rolado over at DBA Stackexchange has a great answer on this topic, eg, https://dba.stackexchange.com/questions/65341/innodb-buffer-pool-hit-rate) and analyze your slow query logs carefully. Towards that later end, I would definitely recommend taking a look at Percona again. Their slow query analyzer is top notch and can really give you a leg up when it comes to optimizing SQL performance.

Life without transactions (MyISAM)

My site runs on a VDS-server. I've just found out that my MySQL server doesn't support InnoDB engine, therefore I can't use database transactions in my application.
It makes me think, that some people might never use transactions. Is this the case? If so, how does one manage to coordinate related operations on different tables in MyISAM?
Otherwise, is there a way to install InnoDB on a MySQL server which is run on a VDS?
Thanks!
If you need transactions, then you need transactions and MyISAM isn't going to cut the mustard.
Some applications won't need transactions. For example; an application that never runs more than one related SQL statement at a time and has no need to rollback multiple SQL statements. Another example is an application that uses MySQL as a simple Key-Value Store. There are many use cases that don't require database level transaction support.
It's hard to answer the second part of your question without knowing more details about your VDS. Who is you hosting provider? Do you have shell access and permissions to change my.cnf? If not, then you probably won't be able to enable InnoDB. If you do, then here is a another SO answer that details how to enable InnoDB on MySQL: How to enable INNODB in mysql
You can often either enable the engine, install the InnoDB components manually, or simply re-install a version of MySQL that includes that engine by default. MyISAM is the crazypants database, stupidly fast but also unreliable and prone to complete destruction if your system isn't shut down properly.
Running a mission critical application on MyISAM is an extremely bad idea. Where you need MyISAM tables for performance reasons they should always be disposable, easily re-built from another more reliable source of data.

How do I set up slave MySQL database for reporting purposes only?

Given a large MySQL production database, optimized for fast inserts, how would I go about setting up a "slave" database that would be optimized for fast searches? In my head, the slave would basically be a replica of the main, but with significantly more indices across the whole database to speed up read access. Is this sort of customized master-to-slave replication possible?
Setting up master/slave replication?
This How to Set Up Replication in MySQL guide has both what you need to do on the master and the slave.
Found here on this MySQL Replication question:
I'd argue that you may want to consider an alternative database for your reporting side. You mentioned the need for heavy indexes -- why? Since I assume you're doing bulk loads of data and hardly modifying the data at all (ie: transactions), then you should consider a columnar database. If you're used to mysql syntax, then there are two column-oriented databases which use mysql as a front end. Thus, your queries will stay the same, but how the data is stored underneath will be completely different.
It really throws your question out and adds a new one in: what's the best way to move my data to allow it to be optimized for many reads and few writes.
Full Disclosure: I work as the open-source guy for Infobright (one of the column oriented techs). Your question is one I see a lot of, so I thought I'd give you my thoughts. However, there are several different columnar vendors; it all depends on volume, money, and time.

MySQL MyISAM data loss possibilities?

Many sites and script still use MySQL instead of PostgreSQL. I have a couple low-priority blogs and such that I don't want to migrate to another database so I'm using MySQL.
Here's the problem, their on a low-memory VPS. This means I can't enable InnoDB since it uses about 80MB of memory just to be loaded. So I have to risk running MyISAM.
With that in mind, what kind of data loss am I looking at with MyISAM? If there was a power-outage as someone was saving a blog post, would I just lose that post, or the whole database?
On these low-end-boxes I'm fine with losing some recent comments or a blog post as long as the whole database isn't lost.
MyISAM isn't ACID compliant and therefore lacks durability. It really depends on what costs more...memory to utilise InnoDB or downtime. MyISAM is certainly a viable option but what does your application require from the database layer? Using MyISAM can make life harder due to it's limitations but in certain scenarios MyISAM can be fine. Using only logical mysqldump backups will interrupt your service due to their locking nature. If you're utilising binary logging you can back these up to give you incremental backups that could be replayed to aid recovery should something corrupt in the MyISAM tables.
You might find the following MySQL Performance article of interest:
For me it is not only about table locks. Table locks is only one of MyISAM limitations you need to consider using it in production. Especially if you’re comming from “traditional” databases you’re likely to be shocked by MyISAM behavior (and default MySQL behavior due to this) – it will be corrupted by unproper shutdown, it will fail with partial statement execution if certain errors are discovered etc...
http://www.mysqlperformanceblog.com/2006/06/17/using-myisam-in-production/
The MySQL manual points out the types of events that can corrupt your table and there is an article explaining how to use myisamchk to repair tables. You can even issue a query to fix it.
REPAIR TABLE table;
However, there is no information about whether some types of crashes might be "unfix-able". That is the type of data loss that I can't allow even if I'm doing backups.
With a server crash your auto increment primary key can get corrupted, so your blog post IDs can jump from 122, 123, 75912371234, 75912371235 (where the server crashed after 123). I've seen it happen and it's not pretty.
You could always get another host on the same VLAN that is slaved to your database as a backup, this would reduce the risk considerably. I believe the only other options you have are:
Get more RAM for your server or kill of some services
See if your host has shared database hosting of any kind on the VLAN you can use for a small fee.
Make regular backups and be prepared for the worst.
In my humble opinion, there is no kind of data loss with MyISAM.
The risk of data loss from a power outage is due to the power outage, not the database storage mechanism.

Clustering, Sharding or simple Partition / Replication

We have created a Facebook application and it got a lot of virality. The problem is that our database started getting REALLY FULL (some tables have more than 25 million rows now). It got to the point that the app just stopped working because there was a queue of thousands and thousands of writes to be made.
I need to implement a solution for scaling this app QUICKLY but I'm not sure if I should pursue Sharding or Clustering since I'm not sure what are the pro's and con's of each of them and I was thinking of doing a Partition / Replication approach but I think that doesn't help if the load is on the writes?
25 million rows is a completely reasonable size for a well-constructed relational database. Something you should bear in mind, however, is that the more indexes you have (and the more comprehensive they are), the slower your writes will be. Indexes are designed to improve query performance at the expense of write speed. Be sure that you're not over-indexed.
What sort of hardware is powering this database? Do you have enough RAM? It's far easier to change these attributes than it is to try to implement complex RDBMS load balancing techniques, especially if you're under a time crunch.
Clustering/Sharding/Partitioning comes when single node has reached to the point where its hardware cannot bear the load. But your hardware has still room to expand.
This is the first lesson I learnt when I started being hit by such issues
Well, to understand that, you need to understand how MySQL handles clustering. There are 2 main ways to do it. You can either do Master-Master replication, or NDB (Network Database) clustering.
Master-Master replication won't help with write loads, since both masters need to replay every single write issued (so you're not gaining anything).
NDB clustering will work very well for you if and only if you are doing mostly primary key lookups (since only with PK lookups can NDB operate more efficient than a regular master-master setup). All data is automatically partitioned among many servers. Like I said, I would only consider this if the vast majority of your queries are nothing more than PK lookups.
So that leaves two more options. Sharding and moving away from MySQL.
Sharding is a good option for handling a situation like this. However, to take full advantage of sharding, the application needs to be fully aware of it. So you would need to go back and rewrite all the database accessing code to pick the right server to talk to for each query. And depending on how your system is currently setup, it may not be possible to effectively shard...
But another option which I think may suit your needs best is switching away from MySQL. Since you're going to need to rewrite your DB access code anyway, it shouldn't be too hard to switch to a NoSQL database (again, depending on your current setup). There are tons of NoSQL servers out there, but I like MongoDB. It should be able to withstand your write load without worry. Just beware that you really need a 64 bit server to use it properly (with your data volume).
Replication is for data backup not for performance so its out of question.
Well, 8GB RAM is still not that much you can have many hundred GB RAM with quite big hard disk space and MySQL would still work for you.
Clustering/Sharding/Partitioning comes when single node has reached to the point where its hardware cannot bear the load. But your hardware has still room to expand.
If you don't want to upgrade your hardware then you need to give more information about database design and if there are lot of joins or not so that above named options can be considered deeply.