GCE snapshot vs VMware snapshot - google-compute-engine

I am now to GCE and now looking for a way to backup our GCE VM.
As per Google documentation, it seems that Google recommends performing the VM backup by creating snapshots. However, in VMware, using snapshots as a backup method is not recommended as the delta disk will grow and the system may be unstable.
I wonder if the way GCE handle snapshot different with Vmware so in GCE, snapshots can be used as a backup method?
Thanks

Yes, you can use snapshots as backup. You can take backup of a persistent disk by creating snapshots even while they are attached to running instances. Snapshots are global resources, so you can use them to restore data to a new disk or instance within the same project. You can also share snapshots across projects.
You can refer to the public documentation of GCP on persistent disk snapshots: https://cloud.google.com/compute/docs/disks/create-snapshots
In GCP you can use the instance templates to create VMs and you can use it as backup for creating VMs when you need.Instance templates define the machine type, boot disk image or container image, labels, and other instance properties. You can then use an instance template to create a MIG or to create individual VMs. Instance templates are a convenient way to save a VM instance's configuration so you can use it later to create VMs or groups of VMs.
You can refer to the public documentation of GCP on Instance templates : https://cloud.google.com/compute/docs/instance-templates

Note on backups and persistent disk snapshots
"If you create a snapshot of your persistent disk while your application is running, the snapshot might not capture pending writes that are in transit from memory to disk." -- Best practices for persistent disk snapshots. This could lead to snapshots which aren't actually a good backup, because the disk is in an unexpected state. The linked doc gives recommendations on pausing applications before snapshotting to ensure consistent backups.
Another option (for applications which support it), is to do application level backups using their tools. Databases, for example, usually have their own tooling for backups which understand application state and create reliable backups.
Are GCE persistent disk snapshots different than VMWare snapshots?
Yes, they are different. GCE PD doesn't stack deltas of the changes over time. The two systems use the word snapshot, but the underlying mechanism is different.
In VMWare, if you take multiple snapshots, each snapshot freezes the current disk state and future snapshots are based on the differences (deltas) from the previous ones, creating a chain of differences and the disk is the most recent state. A snapshot is a point in time of the evolution of the disk data. This allows directly rolling back the disk, but if you store a nightly snapshot as a backup, you can't just delete the old snapshots because new ones rely on the old ones. (There are options to compact them, though)
In Compute Engine, snapshots are separate objects and each snapshot is based on the state of the original disk at that time. A snapshot is a logical copy of the disk at a certain time. If you take nightly backups, you can delete old ones because they are independent from each other.
Also, Compute Engine snapshots are only snapshots of the disk, not the machine configuration. Machine Images capture both machine configuration and the disk state.

Related

Is it recommended to run clustered database with Kubernetes in production environment?

Is it reasonable to use Kubernetes for a clustered database such as MySQL in production environment?
There are example configurations such as mysql galera example. However, most examples do not make use of persistent volumes. As far as I've understood persistent volumes must reside on some shared file system as defined here Kubernetes types of persistent volumes. A shared file system will not guarantee that the database files of the pod will be local to the machine hosting the pod. It will be accessed over network which is rather slow. Moreover, there are issues with MySQL and NFS, for example.
This might be acceptable for a test environment. However, what should I do in a production environment? Is it better to run the database cluster outside Kubernetes and run only application servers with Kubernetes?
The Kubernetes project introduced PetSets, a new pod management abstraction, intended to run stateful applications. It is an alpha feature at present (as of version 1.4) and moving rapidly. A list of the various issues as we move to beta are listed here. Quoting from the section on when to use petsets:
A PetSet ensures that a specified number of "pets" with unique identities are running at any given time. The identity of a Pet is comprised of:
a stable hostname, available in DNS
an ordinal index
stable storage: linked to the ordinal & hostname
In addition to the above, it can be coupled with several other features which help one deploy clustered stateful applications and manage them. Coupled with dynamic volume provisioning for example, it can be used to provision storage automatically.
There are several YAML configuration files available (such as the ones you referenced) using ReplicaSets and Deployments for MySQL and other databases which may be run in production and are probably being run that way as well. However, PetSets are expected to make it a lot easier to run these types of workloads, while supporting upgrades, maintenance, scaling and so on.
You can find some examples of distributed databases with petsets here.
The advantage of provisioning persistent volumes which are networked and non-local (such as GlusterFS) is realized at scale. However, for relatively small clusters, there is a proposal to allow for local storage persistent volumes in the future.

What is difference between snapshot, image and persistent disk on Google Compute Engine?

I have a compute engine instance on google cloud which is running fine. user base is increasing and I wish to upgrade to a bigger compute engine in terms of cpu and memory.
What is the most easy way to do such migration?
What is the snapshot, image, persistent disk features in google compute engine? Are they anyway useful to my task?
I figured it out. Lennert answer is good. I will add few more things to complete it. You can always stop a VM, edit the CPU/memory and restart the VM. But this action may change the external IP address and cause lot of issues. You can handle it but it may cause further downtime. You may have to update the new IP address at DNS and inside the code. One way to avoid this hassle is that you should Reserve a static IP adreess [in console, go to NETWORKING > EXTERNAL IP ADDRESS > RESERVE A STATIC IP ADDRESS]. If you do this, your ip address will not change once you restart the VM.
Image is aka Operating System. While creating a VM, you are asked to choose a boot disk, disk which is used to boot your VM from. You can select from pre-defined images.
Snapshot is the copy of the disk. If it is a boot disk, it contains the operating system image too. We can create a snapshot of an existing disk and use it as the boot disk while creating new VM.
Persistent Disk is the disk that can persists even if you delete the VM [provided you have deselect the option of deleting it while deleting the VM]. We can delete VM and use a persistent disk to create new ones. We can simply pay for persistent disk only, without having any VM.
The easiest way is to stop the machine, change the machine type from the console and start the machine again. No need to create backups (snapshots), new VM's, etc.

Amazon EC2 backup options and differences

I have an EC2 Instance running Windows Server 2012. My server is running a MySQL database, a wordpress website, and a Web Service, all in IIS. I installed these manually and configured them myself if that's important.
I looked into methods for backing up, and came across EBS Snapshots (Elastic Block Store > Snapshots > Create Snapshot), and Images (Instances > Actions > Image > Create Image). From my understanding, an EBS Snapshot is a snapshot (backup) of any attached EBS volumes (in my case the root drive C). An Image is an image of the entire instance. Am I correct so far in my descriptions of the two methods?
I want to have a backup of my server as described above (database, wordpress, web service, iis settings). Would am EBS Snapshot suffice for this? i.e, if my instance or ebs volume fails one day, by recreating an instance and attaching the EBS snapshot to the new instance, will my server be configured the same as the failed instance (database, wordpress, web service, iis settings, etc)?
I am assuming an Image restore to a new instance will mean absolutely everything will be restored as on the initial instance correct?
So with all that said, would an EBS snapshot be enough as a backup solution?
An EBS Snapshot is a crash consistent backup of your volume. By crash consistent it means that it is as good as if your machine crashed (for example, you unplug your computer). It means that open files that have pending changes in buffers might not had been persisted to disk.
So it is not an application consistent backup such as those that can be done with applications that support VSS snapshots, but probably it will be good enough for basic disaster recovery.
Here you will find more info about crash consistent and application consistent backups.
Another important caveat is that for Windows instances, you want to create your AMIs by using Instances > Actions > Image > Create Image. If you try to create a Windows AMI from a volume, it will default to Linux and you won't be able to use them as an AMI to reinstantiate your Windows Server.

Attaching a disk to multiple GCE instances with one write access

I'm thinking about upgrading my company's integration server with a the repos on a separate disk that would be shared with a backup server. Like so:
[Main Integration Server] ---R/W--- [Repo Vdisk] ---R/O--- [Backup Integration Server]
My problem is that according to the GCE docs, if I attach the same Vdisk to more than one instance, all instances must only access the disk in read-only mode. What I'm looking to do would be to have one instance access it in read-write, and one in read-only mode.
Is this at all possible without powering up a third instance to act as a sort of "storage server"?
As you quoted from the docs and as mentioned in my earlier answer, if you attach a single persistent disk to multiple instances, they must all mount it in read-only mode.
Since you're looking for a fully-managed storage alternative so you don't have to run and manage another VM yourself, consider using Google Cloud Storage and mount your bucket with gcsfuse which will make it look like a regular mounted filesystem.

Google Compute Engine adding disk and SSL support

I am new to GCE. I was able to create new instance using gcutil tool and GCE console. There are few questions unclear to me and need help:
1) Does GCE provides persistent disk when a new instance is created? I think its 10GB by default, not sure though. What is the right way to stop the instance without loosing data saved on it and what will be the charge (US zone) if say I need 20GB of disk space for that?
2) If I need SSL to enable HTTPS, is there any extra step I should do? I think I will need to add firewall as per the gcutil addfirewall command and create certificate (or install it from third part) ?
1) Persistent disk is definitely the way to go if you want a root drive on which data retention is independent of the life cycle of any virtual machine. When you create a Compute Engine instance via the Google Cloud Console, the “Boot Source” pull-down menu presents the following options for your boot device:
New persistent disk from image
New persistent disk from snapshot
Existing persistent disk
Scratch disk from image (not recommended)
The default option is the first one ("New persistent disk from image"), which creates a new 10 GB PD, named after your instance name with a 'boot-' prefix. You could also separately create a persistent disk and then select the "Existing persistent disk" option (along with the name of your existing disk) to use an existing PD as a boot device. In that case, your PD needs to have been pre-loaded with an image.
Re: your question about cost of a 20 GB PD, here are the PD pricing details.
Read more about Compute Engine persistent disks.
2) You can serve SSL/HTTPS traffic from a GCE instance. As you noted, you'll need to configure a firewall to allow your incoming SSL traffic (typically port 443) and you'll need to configure https service on your web server and install your desired certificate(s).
Read more about Compute Engine networking and firewalls.
As alternative approach i would suggest deploying VMs using Bitnami. There are many stacks you can choose from. This will save you time when deploying the VM. I would suggest you go with the SSD disks, as the pricing is close between magnetic disks and SSDs, but the performance boost is huge.
As for serving the content over SSL, you need to figure out how will the requests be processed. You can use NGINX or Apache servers. In this case you would need to configure the virtual hosts for default ports - 80 for non-encrypted and 443 for SSL traffic.
The easiest way to serve SSL traffic from your VM is generate SSL certificates using the Letsencrypt service.