Amazon EC2 backup options and differences - mysql

I have an EC2 Instance running Windows Server 2012. My server is running a MySQL database, a wordpress website, and a Web Service, all in IIS. I installed these manually and configured them myself if that's important.
I looked into methods for backing up, and came across EBS Snapshots (Elastic Block Store > Snapshots > Create Snapshot), and Images (Instances > Actions > Image > Create Image). From my understanding, an EBS Snapshot is a snapshot (backup) of any attached EBS volumes (in my case the root drive C). An Image is an image of the entire instance. Am I correct so far in my descriptions of the two methods?
I want to have a backup of my server as described above (database, wordpress, web service, iis settings). Would am EBS Snapshot suffice for this? i.e, if my instance or ebs volume fails one day, by recreating an instance and attaching the EBS snapshot to the new instance, will my server be configured the same as the failed instance (database, wordpress, web service, iis settings, etc)?
I am assuming an Image restore to a new instance will mean absolutely everything will be restored as on the initial instance correct?
So with all that said, would an EBS snapshot be enough as a backup solution?

An EBS Snapshot is a crash consistent backup of your volume. By crash consistent it means that it is as good as if your machine crashed (for example, you unplug your computer). It means that open files that have pending changes in buffers might not had been persisted to disk.
So it is not an application consistent backup such as those that can be done with applications that support VSS snapshots, but probably it will be good enough for basic disaster recovery.
Here you will find more info about crash consistent and application consistent backups.
Another important caveat is that for Windows instances, you want to create your AMIs by using Instances > Actions > Image > Create Image. If you try to create a Windows AMI from a volume, it will default to Linux and you won't be able to use them as an AMI to reinstantiate your Windows Server.

Related

GCE snapshot vs VMware snapshot

I am now to GCE and now looking for a way to backup our GCE VM.
As per Google documentation, it seems that Google recommends performing the VM backup by creating snapshots. However, in VMware, using snapshots as a backup method is not recommended as the delta disk will grow and the system may be unstable.
I wonder if the way GCE handle snapshot different with Vmware so in GCE, snapshots can be used as a backup method?
Thanks
Yes, you can use snapshots as backup. You can take backup of a persistent disk by creating snapshots even while they are attached to running instances. Snapshots are global resources, so you can use them to restore data to a new disk or instance within the same project. You can also share snapshots across projects.
You can refer to the public documentation of GCP on persistent disk snapshots: https://cloud.google.com/compute/docs/disks/create-snapshots
In GCP you can use the instance templates to create VMs and you can use it as backup for creating VMs when you need.Instance templates define the machine type, boot disk image or container image, labels, and other instance properties. You can then use an instance template to create a MIG or to create individual VMs. Instance templates are a convenient way to save a VM instance's configuration so you can use it later to create VMs or groups of VMs.
You can refer to the public documentation of GCP on Instance templates : https://cloud.google.com/compute/docs/instance-templates
Note on backups and persistent disk snapshots
"If you create a snapshot of your persistent disk while your application is running, the snapshot might not capture pending writes that are in transit from memory to disk." -- Best practices for persistent disk snapshots. This could lead to snapshots which aren't actually a good backup, because the disk is in an unexpected state. The linked doc gives recommendations on pausing applications before snapshotting to ensure consistent backups.
Another option (for applications which support it), is to do application level backups using their tools. Databases, for example, usually have their own tooling for backups which understand application state and create reliable backups.
Are GCE persistent disk snapshots different than VMWare snapshots?
Yes, they are different. GCE PD doesn't stack deltas of the changes over time. The two systems use the word snapshot, but the underlying mechanism is different.
In VMWare, if you take multiple snapshots, each snapshot freezes the current disk state and future snapshots are based on the differences (deltas) from the previous ones, creating a chain of differences and the disk is the most recent state. A snapshot is a point in time of the evolution of the disk data. This allows directly rolling back the disk, but if you store a nightly snapshot as a backup, you can't just delete the old snapshots because new ones rely on the old ones. (There are options to compact them, though)
In Compute Engine, snapshots are separate objects and each snapshot is based on the state of the original disk at that time. A snapshot is a logical copy of the disk at a certain time. If you take nightly backups, you can delete old ones because they are independent from each other.
Also, Compute Engine snapshots are only snapshots of the disk, not the machine configuration. Machine Images capture both machine configuration and the disk state.

What is difference between snapshot, image and persistent disk on Google Compute Engine?

I have a compute engine instance on google cloud which is running fine. user base is increasing and I wish to upgrade to a bigger compute engine in terms of cpu and memory.
What is the most easy way to do such migration?
What is the snapshot, image, persistent disk features in google compute engine? Are they anyway useful to my task?
I figured it out. Lennert answer is good. I will add few more things to complete it. You can always stop a VM, edit the CPU/memory and restart the VM. But this action may change the external IP address and cause lot of issues. You can handle it but it may cause further downtime. You may have to update the new IP address at DNS and inside the code. One way to avoid this hassle is that you should Reserve a static IP adreess [in console, go to NETWORKING > EXTERNAL IP ADDRESS > RESERVE A STATIC IP ADDRESS]. If you do this, your ip address will not change once you restart the VM.
Image is aka Operating System. While creating a VM, you are asked to choose a boot disk, disk which is used to boot your VM from. You can select from pre-defined images.
Snapshot is the copy of the disk. If it is a boot disk, it contains the operating system image too. We can create a snapshot of an existing disk and use it as the boot disk while creating new VM.
Persistent Disk is the disk that can persists even if you delete the VM [provided you have deselect the option of deleting it while deleting the VM]. We can delete VM and use a persistent disk to create new ones. We can simply pay for persistent disk only, without having any VM.
The easiest way is to stop the machine, change the machine type from the console and start the machine again. No need to create backups (snapshots), new VM's, etc.

Can I install MySQL on the VMs provided in Azure Cloud Services?

From what I gather, the only way to use a MySQL database with Azure websites is to use Cleardb but can I install MySQL on VMs provided in Azure Cloud Services. And if so how?
This question might get closed and moved to ServerFault (where it really belongs). That said: ClearDB provides MySQL-as-a-Service in Azure. It has nothing to do with what you can install in your own Virtual Machines. You can absolutely do a VM-based MySQL install (or any other database engine that you can install on Linux or Windows). In fact, the Azure portal even has a tutorial for a MySQL installation on OpenSUSE.
If you're referring to installing in web/worker roles: This simply isn't a good fit for database engines, due to:
the need to completely script/automate the install with zero interaction (which might take a long time). This includes all necessary software being downloaded/installed to the vm images every time a new instance is spun up.
the likely inability for a database cluster to cope with arbitrary scale-out (the typical use case for web/worker roles). Database clusters may or may not work well when a scale-out occurs (adding an additional vm). Same thing when scaling in (removing a vm).
less-optimal attached-storage configuration
inability to use Linux VMs
So, assuming you're still ok with Virtual Machines (vs stateless Cloud Service vm's): You'll need to carefully plan your deployment, with decisions such as:
Distro (Ubuntu, CentOS, etc). Azure-supported Linux distro list here
Selecting proper VM size (the DS series provide SSD attached disk support; the G series scale to 448GB RAM)
Azure Storage attached disks being non-Premium or Premium (premium disks are SSD-backed, durable disks scaling to 1TB/5000 IOPS per disk, up to 32 disks per VM depending on VM size)
Virtual network configuration (for multi-node cluster)
Accessibility of database cluster (whether your app is in the vnet or accesses it through a public endpoint; and if the latter, setting up ACL's)
Backup / HA / DR planning
Someone else mentioned using a pre-built VM image from VM Depot. Just realize that, if you go that route, you're relying on someone else to configure the database engine install for you. This may or may not be optimal for what you're trying to achieve. And the images may or may not be up-to-date with the latest versions, patches, etc.
Of course, what I wrote applies to any database engine you install in your own virtual machines, where a service provider (such as ClearDB) tends to take care of most of these things for you.
If you are talking about standard VMs then you can use a pre-built images on VMDepot for that.
If you are talking about web or worker roles (PaaS) I wouldn't recommend it, but if you really want to you could. You would need to fully script the install of the solution on the host. The only downside (and it's a big one) you would have would be the that the host will be moved to a new host at some point which would mean your MySQL data files would be lost - if you backed up frequently and were happy to lose some data then this option may work for you.
I think, that the main question is "what You want to achieve?". As I see, You want to use PaaS solution with Web Apps or Cloud Service and You need a MySQL database. If Yes, You have two options (both technically as David Makogon said). First one is to deploy Your own (one) server with MySQL and connect to it from the outside (internet side). Second solution is to create one MySQL server or cluster and connect Your application internally in Azure virtual network. WIth Cloud Service it is simple but with Web App it is not. You must create VPN gateway in Azure VM and connect Your Web App to this gateway. In this way You will have internal connection wfrom Your application to Your own MySQL cluster.

Google Compute Engine adding disk and SSL support

I am new to GCE. I was able to create new instance using gcutil tool and GCE console. There are few questions unclear to me and need help:
1) Does GCE provides persistent disk when a new instance is created? I think its 10GB by default, not sure though. What is the right way to stop the instance without loosing data saved on it and what will be the charge (US zone) if say I need 20GB of disk space for that?
2) If I need SSL to enable HTTPS, is there any extra step I should do? I think I will need to add firewall as per the gcutil addfirewall command and create certificate (or install it from third part) ?
1) Persistent disk is definitely the way to go if you want a root drive on which data retention is independent of the life cycle of any virtual machine. When you create a Compute Engine instance via the Google Cloud Console, the “Boot Source” pull-down menu presents the following options for your boot device:
New persistent disk from image
New persistent disk from snapshot
Existing persistent disk
Scratch disk from image (not recommended)
The default option is the first one ("New persistent disk from image"), which creates a new 10 GB PD, named after your instance name with a 'boot-' prefix. You could also separately create a persistent disk and then select the "Existing persistent disk" option (along with the name of your existing disk) to use an existing PD as a boot device. In that case, your PD needs to have been pre-loaded with an image.
Re: your question about cost of a 20 GB PD, here are the PD pricing details.
Read more about Compute Engine persistent disks.
2) You can serve SSL/HTTPS traffic from a GCE instance. As you noted, you'll need to configure a firewall to allow your incoming SSL traffic (typically port 443) and you'll need to configure https service on your web server and install your desired certificate(s).
Read more about Compute Engine networking and firewalls.
As alternative approach i would suggest deploying VMs using Bitnami. There are many stacks you can choose from. This will save you time when deploying the VM. I would suggest you go with the SSD disks, as the pricing is close between magnetic disks and SSDs, but the performance boost is huge.
As for serving the content over SSL, you need to figure out how will the requests be processed. You can use NGINX or Apache servers. In this case you would need to configure the virtual hosts for default ports - 80 for non-encrypted and 443 for SSL traffic.
The easiest way to serve SSL traffic from your VM is generate SSL certificates using the Letsencrypt service.

Java EE application deployment on Amazon EC2

We have a Java EE application (EAR file deployed on JBoss, MySQL, MongoDB) which we would like to deploy on an Amazon EC2 instance. I have several questions regarding deployment best practices.
What is the most commonly used Linux AMI which we can rely on for a robust deployment (There are so many Linux variants, and I am not sure which AMI is commonly used, is it Fedora, CentOS, Red Hat, SUSE ...)
How do we handle production upgrades (EAR file modifications or schema upgrades). Are there any tools which are available to handle this installation or rollback of these changes.
What kind of data backup capability is available for the database?
Should I rely on Amazon RDS for MySQL support?
How should I handle support for MongoDB?
This is the first time, I am hosting an web-app and would appreciate some inputs on how to manage the production instance.
I agree with Mark Robinson's answer: Use whichever Unix variant you're most comfortable with. It may pay to pick one with decent cloud support. For my site I use Ubuntu.
I have a common image which is the base of every version deploy I do. I have www.mysite.com pointing to an Elastic IP so I can decide which instance it goes to. The common image has all the software I need installed (Postgres/Postgis/Tomcat/etc) but the database and web server data folders and symlinked to Elastic Block Store (EBS) instances.
When it comes time to do a deploy I start a new instance up, freeze and snapshot the EBS volumes on production and make new volumes. I point my new instance at the new volumes and then install whatever I need to onto that. Once I've smoke tested everything successfully I can switch the Elastic IP to point to the new instance and everything keeps on going.
I'll note that I currently have the advantage where only I can modify the database; no users can. This will become a problem shortly.
If you use the XFS filesystem on top of the EBS volume then you can tell XFS to freeze the file system (so no updates happen) then call the EC2 api to snapshot the volume then unfreeze the file system. The result is that the snapshot is taken quickly and sent to S3. I have a nightly script which does this.
If RDS looks like it will suit your needs then use it. Amazon is building lots of solid tools quickly and this will ease your scalability issues if you have any.
I'm sorry, I have no idea.
Good question!
1) I would recommend going with whatever Linux variant you are most comfortable with. If you have someone who is really keen on CentOS, go with that. Once you have selected your AMI, take it and customize it by configuring how you want it. Then save that AMI as you base-layout. It will make rolling out new machines much easier and save your bacon if EC2 goes down.
2) Upgrades with EC2 can be tres cool. Instead of upgrading a live system, take your pre-configured AMI, update that and save that AMI as myAMI-1.1 (or whatever). That way, you can flip over to the new system almost instantly AND roll back to a previous version in case something breaks. You can also back-up DB instances to S3. It's cheap at about $0.10/GB/Month.
3) It depends where you are storing your DB. If you are storing it on your EC2 instance you are in trouble. The EC2 instances have no persistence storage. So if your machine crashes, you lose everything. I'm not familiar with Amazon DB system but you should also look into Elastic Block Store. It's basically an actual hard-drive you can write to. When you want to upgrade your schema, do a full DB dump to S3 and then do an upgrade of your actual schema. If something goes wrong, you can pull the previous version out of S3.
4) & 5) I have never used those so I can't help you.
What is the most commonly used Linux AMI which we can rely on for a robust deployment (There are so many Linux variants, and I am not sure which AMI is commonly used, is it Fedora, CentOS, Red Hat, SUSE ...)
How do we handle production upgrades (EAR file modifications or schema upgrades). Are there any tools which are available to handle this installation or rollback of these changes.
What kind of data backup capability is available for the database?
Should I rely on Amazon RDS for MySQL support?
How should I handle support for MongoDB?
Any Linux AMI will do the job, what you need is a JRE only. (assuming development work not required). If you need to monitor the JVM behavior then get JConsole installed.
Easiest and painless way is to SSH into the local home directory, transfer the updated class file/EAR file (depends the number of changes applied) and copy and replace into the Tomcat deployment directory, restart apache. (make sure you tested locally before upload to production).
Depends on which database you are using, if you are using MySQL then just do scheduled backup that writes to your home directory so that from time to time you could SSH in and download a copy for backup purpose.
I would not consider reply on Amazon RDS for MySQL support due to 2 reasons: MySQL is small enough and manageable, and also I would want to have total complete control of the database and why pay for more when you can do it yourself FOC?
The usage of MongoDB should be align with the purpose of your application and benefits you gain from that. I would recommend you use MongoDB for static data retrieval like state, country, area etc... where MySQL to be use for transaction data only.
If you can live with deploying your Java EE application on TomEE instead of JBoss, Boxfuse does what you want.
For you Java EE application you literally only have to execute (TomEE uses war files instead of ear files):
boxfuse run my-tomee-app-1.0.war -env=prod
This will
Create AMI containing TomEE and your application ready to boot
Create an Elastic IP or ELB
Create a security group with the correct ports defined
Create an auto-scaling group
Launch your instance(s)
Any subsequent update will be done as a zero downtime blue/green deployment.
More info: https://boxfuse.com/blog/javaee-aws