Cloud based LAMP cluster - mysql

I run a pretty customized cluster for processing large amounts of scientific data based on a basic LAMP design. In general, I run a separate MySQL server with around 128GB of ram and about 1TB of storage. Separately, I run a head node that serves as an nfs mount point for the data input of my process, and a webserver to display results. Finally, I trypically have a few compute nodes that get their jobs from a mysql table, get the data from NFS, do some heavy lifting, then put results into mysql.
I have come across a dataset I would like to process which is pretty large (1TB of input data), and I don't really have the hardware on hand to handle it. As a result, I began investigating google compute engine etc, and the prospect of scaling instances to process these data rapidly with the results stored in a mysql instance. Upon completion the mysql tables could be dumped from the cloud and brought up locally for analysis. I would have no problem deploying a MySQL server, along with the rest of the LAMP pieces and the compute nodes, but I can't quite figure out how I would do this in the cloud.
A major sticking point seems to be the lack of read/write NFS which would allow me to get the data onto several instances, crunch it, then push the results to MySQL. This is a necessary step for me as I could queue hundreds of jobs from the webserver, then have the instances (as many as 50-100) pick the jobs up by connecting to a centralized mysql instance to find out what jobs an instance needs to do and where the data is. Process the data (there is a file conversion that happens which make the write part necessary), crunch the data, then load results to mysql. I hope I'm explaining my situation clearly. This seems like a great example of a CPU intensive process that would scale nicely in the cloud, I just can't seem to put all the pieces together... Any input is appreciated!

It sounds quite possible; I've been doing similar things in GCE for a while now.
NFS mount - you just need configure it as you would normally. Set up the NFS server on the head node, and then configure the clients on the slave nodes to mount it. Here and here are some basic configuration instructions for Centos 6 I used to get NFS up and running.
Setting up a LAMP stack is very straightforward. These machines run pretty much vanilla Linux distros, so you can just use yum or apt-get to install components.
For the cluster, you will probably end up having an image for the head node you use once, and then another image for the slave nodes that you replicate for each one.
For the scheduler, I've used Condor and Sge successfully, but I'm sure the other ones would work just as well.
Hope this helps.

Related

Node and Deno servers accessing the same MySQL database

I want to test Node and Deno and try to redirect users via proxy to one MySQL DB.
How will it impact the database?
Can some timestamp conflicts via CRUD operations arise or does MySQL have some mechanism to cope with connections from multiple servers?
What about performance or memory footprint of the database in RAM? Will it be occupying the same amount of space as if there was only one server requesting the database to CRUD something?
What would happen if I added another server that will connect to the DB, for example, java or Go server?
It will virtually have no impact on the database other than having any other concurrent processes connecting to it.
This is not a deno issue but rather a database issue.
The exact same problems can happen even with your current single Node.js instance, because the nature of all systems these days is concurrent/parallel.
You might as well replace the Deno app with another Node.js instance, Java, etc. Or even your current Node.js app.
Data in a database can change once you loaded it to the client, and it is up to you to implement the code that will handle such scenarios.
The fact that MySQL is not "ACID" is neither negative nor relevant in and of itself because it is doesn't have context.
If you need complete absolute integrity on a registry make sure you lock it when you select it, but there will be a trade off.

What is an efficient way to maintain a local readonly copy of a live remote MySQL database?

I maintain a server that runs daily cron jobs to aggregate data sources and generate reports, accessible by a private Ruby on Rails application.
One of our data sources is a partial dump of one of our partner's databases. The partner runs an active application and the MySQL DB has hundreds of tables. They have given us read-only access to a relatively underpowered readonly slave of their application DB.
Because of latency issues and performance bottlenecking on their slave DB, we have been maintaining a limited local copy of their DB. We only need about 20 tables for our reports, so I only dump those tables. We also only need the data to a daily granularity, so realtime sync is not a requirement.
For a few months, I had implemented a nightly cron which streamed the dump of the necessary tables into a local production_tmp database. Then, when all tables were imported, I dropped production and renamed production_tmp to production. This was working until the DB grew to over 25GB, and we started running into disk space limitations.
For now, I have removed the redundancy step and am just streaming the dump straight into production on our local server. This feels a bit flimsy to me, and I would like to implement a safer approach. Also, currently doing the full dump/load takes our server over 2 hours, and I'd like to implement an approach that doesn't take as long. The database will only keep growing, so I'd like to implement something future proof.
Any suggestions would be appreciated!
I take it you have never heard of, or considered MySQL Replication?
The idea is that you do your backup & restore once, and then configure the replica to "subscribe" to a continuous stream of changes as they are made on the primary MySQL instance. Any change applied to the primary is applied automatically to the replica within seconds. You don't have to do the backup & restore procedure again, unless the replica gets damaged.
It takes some care to set up and keep working, but it's a much more efficient method of keeping two instances in sync.
#SusannahPotts mentions hot backup and/or incremental backup. You can get both of these features for free, without paying for MySQL Enterprise using Percona XtraBackup.
You can also consider using MySQL Transportable Tablespaces.
You'll need filesystem access to run either Percona XtraBackup or MySQL Enterprise Backup. It's not possible to use these physical backup tools for Amazon RDS, for example.
One alternative is to create a replication slave in the same network as the live system, and run Percona XtraBackup on that slave, where you do have filesystem access.
Another option is to stream the binary logs to another host (see https://dev.mysql.com/doc/refman/5.6/en/mysqlbinlog-backup.html) and then transfer them periodically to your local instance and replay them.
Each of these solutions has pros and cons. It's hard to recommend which solution is best for you, because you aren't sharing full details about your requirements.
This was working until the DB grew to over 25GB, and we started running into disk space limitations.
Some question marks "here":
Why don't you just increase the available Diskspace for your database? 25 GB seems nothing when it comes down to disk-space?
Why don't you modify your script to: download table1, import table1_tmp, drop table1_prod, rename table1_tmp to table1_prod; rinse and repeat.
Other than that:
Why don't you ask your partner for a system with enough performance to run your reports on? I'm quite sure, he would prefer this rather than having YOU download sensitive data every day to your "local site"?
Last thought (requires MySQL Enterprise Backup https://www.mysql.de/products/enterprise/backup.html):
Rather than dumping, downloading and importing 25 GB every day:
Create a full backup
Download and import
Use Differential or incremental backups from now.
The next day you download (and import) only the data-delta: https://dev.mysql.com/doc/mysql-enterprise-backup/4.0/en/mysqlbackup.incremental.html

Which MySQL configuration do I want for simple load balancing for a web application?

We are building a small advertising platform that will be used on several client sites. We want to setup multiple servers and load balancing (using Amazon Elastic Load Balancer) to help prevent downtime.
Our basic functions include rendering HTML for ads, recording click information (IP, user-agent, location, etc.), redirecting traffic with their click ID as a tracking variable (?click_id=XX), and basic data analysis for clients. It is very important that two different clicks don't end up with the same click ID.
I understand the basics of load balancing, but I'm not sure how to setup the MySQL server(s).
It seems there are a lot of options out there: master-master, master-slave, clusters, shards.
I'm trying to figure out what is best for us. The most important aspects we are looking for are:
Uptime - if one server goes down, automatically get picked up by another server.
Load sharing - keep CPU and RAM usage spread out.
From what I've read, it sounds like my best option might be a Master with 2 or more slaves. The Master would not be responsible for any HTTP traffic, that would go to the slaves only. The Master server would therefore only be responsible for database writes.
Would this slow down our click script? Since we have to insert first to get a click ID before redirecting, the Slave would have to contact the Master and then return with the data. Right now our click script is lightning fast and I'd like to keep it that way.
Also, what happens if the Master goes down? Would a slave be able to serve as the Master until the Master was back online?
If you use Amazon's managed database service, RDS, this will take a lot of the pain out of managing your database.
You can select the multi-AZ option on your master database instance to provide a redundant, synchronously replicated slave in another availability zone. In the event of a failure of the instance or the entire availability zone Amazon will automatically flip the A record pointing to your master instance to the slave in the backup AZ. This process, on vanilla MySQL or MariaDB, can take a couple of minutes during which time your application will be unable to write to the database.
You can also provision up to 5 read replicas for a MySQL or MariaDB instance that will replicate from the master asynchronously. You could then use an Elastic Load Balancer (or other TCP load balancer such as HAProxy or MariaDB's MaxScale for a more MySQL aware option) to distribute traffic across the read replicas. By default each of these read replicas will have a full copy of the master's data set but if you wanted to you could attempt to manually shard the data across these. You'd have to have some more complicated logic in your application or the load balancer to work out where to find the relevant shard of the data set though.
You can choose to promote a read replica into a stand alone master which will break replication to the master and give you a stand alone cluster which can then be reconfigured as to your previous setup (or something different if you want and just using the data set you had at the point of promotion). It doesn't sound like something you need to do here though.
Another option would be to use Amazon's own flavour of MySQL, Aurora, on RDS. Aurora is completely MySQL over the wire compatible so you can use whatever MySQL driver your application already uses to talk to it. Aurora will allow up to 15 read replicas and completely transparent load balancing. You simply provide your application with the Aurora cluster endpoint and then any writes will happen on the master and any reads will be balanced across however many read replicas you have in the cluster. In my limited testing, Aurora's failover between instances is pretty much instant too so that minimises down time in the event of a failure.

Copy tomcat and mysql from one Amazon EBS volume to another

I launched an Amazon EC2 with Amazon Linux and Amazon-EBS as root volume. I also started tomcat7 and mysql 5.5 on this EBS volume.
Later I decided to change from Amazon Linux to Ubuntu. To do that I need to launch another Amazon EC2 instance with a new EBS root volume. Now I want to copy tomcat7 and mysql from older EBS volume to new one. I have tables and data in mysql which I don't want to loose and an application running on tomcat. How to go about it?
A couple of thoughts and suggestions.
First, if you are going to be having any kind of significant load on your database, running it on EBS-backed volume is probably not a great idea as EBS-backed storage is incredibly slow relative to the machine's local/ephemeral storage (/mnt). Now obviously you don't want DB data on ephemeral storage, so there is really nothing you can do about it if you want to run MySQL on EC2. So my suggestion would be to utilize an RDS instance for your DB if your infrastructure requirements allow for it.
Second, if this is a production application, you are undoubtedly going to have some down time as you make this transition. The question is whether you need to absolutely minimize the amount of downtime. If so, then you need to have an idea as to the size of your database. Is it going to take a long time to dump/load? If not, you could probably just get your new instance up and running, and tested on an older copy of your database and then just dump and load the current database at the time of cutover.
If it is a large database then perhaps you can turn on MySQL binary logging. Then make a dump of the database at a known binary log position. Then install this dump on your new instance. Then when ready to cutover, you can replay the binary logs on the new instance to bring it current. Similarly, you could just set up the DB on the new instance as a replica until the cutover, at which point you make it the master.
You may even consider just using rsync to sync the physical database files if you don't want to mess with binary logging, though this can be a problematic approach if you are not that familiar with dealing with the actual physical database files.
As far as your application goes, that should be much simpler to migrate assuming it is just a collection of files. I would not copy the Tomcat7 installation itself, but rather just install Tomcat on Ubuntu and then adjust the configuration to match current.
As far as the cutover itself goes, this should be pretty straightforward and would vary in approach depending on whether you are using an elastic IP for your server or whether it is behind a load balancer,

How to clone mySQL continuously .. instantly on shared hosting

I have a MySQL install on a shared server and have access through phpMyAdmin. I want to make a continuous, real time clone of that database to a cloud mySQL database (we have created an Nginx-ready MySQL server specially for this database) I want to create a real time clone of the old one, then update code to point to the new database...
I think you will have difficulty doing real-time replication of a MySQL in a shared server environment. Since you appear to be moving db servers, I would be inclined to do a hot copy of your data, and install that on the new db server. At the same time as taking that copy, you should switch on query logging on your application.
Your switch over would then consist of running logged queries against the new database (faster than they were logged!) and finally, at a point that all logged queries have been run, switching the configuration of the app so that the new db is used.
Edit: the problem with a hot copy is that data is being written to the db at the same time as it is being copied. That means that the 'last updated' time will be different for each table. On that basis, is it possible in your application to set up a 'last_updated' column for each row? If so you will be able to tell for each table which logged queries still need to be copied.
What you're looking for is replication. It has far to many options to cover here in a single post.
http://dev.mysql.com/doc/refman/5.5/en/replication.html
If your going to do replication over the internet you'll want to secure it.Your host might allow a virtual local area network So this doesn't use up your bandwidth resources.
A great set of tools from percona you should look at are maatkit
https://launchpad.net/percona-toolkit
Documentation and usage examples
http://www.maatkit.org/doc/
It's good for other tasks but it also allows you to replicate a live database quickly.
When your working with live databases make sure your backups are upto date.