Deleting incremental snapshots before occasional full backup Google Cloud Compute Engine - google-compute-engine

Regarding snapshots on Google Cloud Compute Engine, I have some questions I could not find answers to in the documentations:
We do have a two-hourly frequency for some of our disks. The documentation says that at not defined times, a full image of the disk is captured. In case I do not need to restore anything from before the latest full image, does this mean that all previous snapshots to the new full image could be deleted?
If so, how do I identify the snapshots that can be deleted?
Or: Is there even a way to accomplish this task automatically (e.g. something like "delete all prior incremental images after latest full image"?)

Let me provide you some links to the documentation that could answer your questions:
Accordingly to the documentation Working with persistent disk snapshots:
Compute Engine uses incremental snapshots so that each snapshot contains only the data that has changed since the previous snapshot.
on the other hand, as it was admitted by #Peter Sonntag, accordingly to the documentation Use existing snapshots as a baseline for subsequent snapshots:
Important: Snapshots are incremental by default to avoid billing you for redundant data, to minimize use of storage space, and to decrease snapshot creation latency. However, to ensure the reliability of snapshot history, a snapshot might occasionally capture a full image of the disk.
Accordingly to the documentation Snapshot deletion:
When you delete a snapshot, Compute Engine immediately marks the
snapshot as DELETED in the system. If the snapshot has no dependent
snapshots, it is deleted outright. However, if the snapshot does have
dependent snapshots:
Any data that is required for restoring other snapshots is moved into the next snapshot, increasing its size.
Any data that is not required for restoring other snapshots is deleted. This lowers the total size of all your snapshots.
The next snapshot no longer references the snapshot marked for deletion, and instead references the snapshot before it.
To automate deletion of the snapshots of you can use Snapshot retention policy:
A snapshot retention policy defines how long you want to keep your snapshots.
If you choose to set up a snapshot retention policy, you must do so as part of your snapshot schedule.

Related

Google cloud compute instance metrics taking up disk space

I have a google cloud compute instance set up but it's getting low on disk space. It looks like it is the /mnt/stateful_partition/var/lib/metrics directory taking up a significant amount of space (3+gb). I assume this is the compute metrics but I can't find any way to safely remove these other than just deleting the files. Is this going to cause any issues?
The path you are referring are File System directories that are used for the GCE VM instance, and you are correct that the metrics folder is safe to be removed. To learn more about these directories, see Disks and file system overview.
I would also suggest to create a snapshot first if you wanted to make sure that the changes you will do on your instance won't affect your system performance. So that you can easily revert it back to your previous instance state.

Why cloning an RDS cluster is faster and more space efficient than snapc

I want to create a Duplicate (clone) of an Aurora DB cluster.
Both the source and the copy are in the same region and are both for dev purposes .
Both are MySql.
I want to access each cluster via a different url .
Reading about Copy-on-write protocol for Aurora cloning.
and SQL snapshot
The aws docs state that :"Creating a clone is faster and more space-efficient than physically copying the data using a different technique such as restoring a snapshot." (source)
Yet , I don't quite understand why using a snapshot is an inferior solution ?
Snapshot is slower, because first snapshot copies entire db storage:
The amount of time it takes to create a DB cluster snapshot varies with the size your databases. Since the snapshot includes the entire storage volume, the size of files, such as temporary files, also affects the amount of time it takes to create the snapshot.
So if you database has, lets say 100GB, the first snapshot of it will require copying 100 GBs. This operation can take time.
In contrast, when you clone, there is no copy done at first. Both original and the new database use same storage. Only when a write operation is performed, they start to diverge.

GCP SQL Cloud - Empty MySQL Database taking 1.2GB of storage space

I'm just looking for a clarification as the documentation state that an empty, just created, SQL Instance should only take 250Mb approximately.
Quoting from documentation about anything I could find on storage space:
MySQL Second Generation instances: The most recent 7 automated backups, and all on-demand backups, are retained. They are charged at the backup storage rate. Binary logs use storage space (not backup space), and are charged as storage.
For the purpose of this test, binary logging is disabled.
MySQL Second Generation [...]: Storage is calculated based on the amount of storage you have provisioned for your instance. Storage for backups is charged by how much space your backups are using. Storage is charged whether your instance is on or off.
Again, freshly created instance. It should be 0 storage space.
A newly created database uses about 270MB of space for system tables and InnoDB logs. If you create a per-use instance but do not use it, you are still charged for the storage cost.
This is where I got the idea about "250 MB" as initial storage space.
As you can see however, a newly created database takes around 1.2GB.
I'd like some clarification on it if someone has any.
Sources:
https://cloud.google.com/sql/faq#storage_limits
https://cloud.google.com/sql/docs/mysql/pricing#2nd-gen-storage-networking-prices
I've been looking into this and the thing that you should take into account is that, the information you quoted about Cloud SQL MySQL empty instances occupying around 270MB is for first generation instances, not for second generation ones. I think that answers your question.
At first I interpreted that the same way as you did, but the only points where 270MB empty instances are specified are here and here which are within the "MySQL First Generation Pricing" category on the right.
Hope this helps.

Large log data in Couchbase

I have a couchbase server to storing a huge data.
This data growing daily, but i also daily delete after process it.
Current, this data has about 1320168 items count, with 2.97G of Data Usage
But why Disc Usage is very large with 135G ???
My disc is lowing space to store more data.
Could delete this data log files to reduce disc Usage?
Couchbase uses an append-only format for storage. This means that every update or delete operation is actually stored as a new entry in the storage file and consumes more disk space.
Then a process called compaction occurs, that will reclaim the unnecessary used space. Compaction can either be configured to run automatically, when a certain fragmentation % is reached in your cluster, or manually on each node.
IIRC auto-compaction is not on by default.
So what you probably want to do is run compaction on your cluster. Note that it may require quite a large amount of diskspace, as noted here...
See the doc on how to perform compaction (in your case at the end of business day I guess you have an "off-peak" window where you currently delete and could perform compaction).
PS: Maybe guys in the official forums may have more insight and recommandations to offer.

Google Compute Engine: what is the difference between disk snapshot and disk image?

I've been using both for my startup and to me, the functionality is the same. Until now, the instances I've been creating are only for computation. I'm wondering how GCE disk images and snapshots are different in terms of technology, and in which situation it is better to use one over the other.
A snapshot reflects the contents of a persistent disk in a concrete instant in time. An image is the same thing, but includes an operating system and boot loader and can be used to boot an instance.
Images and snapshots can be public or private. In the case of images, public can mean official public images provided by Google or not.
Snapshots are stored as diffs (a snapshot is stored relative to the previous one, though that is transparent to you) while images are not. They are also cheaper ($0.026 per GB/month vs $0.050 for images) (Snapshots are increasing to $0.050/GB/month on October 1, 2022).
These days the two concepts are quite similar. It's now possible to start an instance using a snapshot instead of an image, which is an easy way of resizing your boot partition. Using snapshots may be simpler for most cases.
Snapshots:
Good for backup and disaster recovery
Lower cost than images
Smaller size than images since it doesn't contain OS, etc.
Differential backups - only the data changed since the last snapshot
is recreated
Faster to create than images
Snapshots are only available in the project they are
created (now it is possible to share between projects)
Can be created for running disks even while they are attached
to running instances
Images:
Good for reusing compute engine instance states with new instances
Available across different projects
Can't be created for running instances(unless you use --force flag)
Snapshots are primarily targeting backup and disaster recovery scenarios, they are cheaper, easier to create (can often be uploaded without stopping the VM). They are meant for frequent regular upload, and rare downloads.
Images are primarily meant for boot disk creation. They optimized for multiple downloads of the same data over and over. If the same image downloaded many times, subsequent to the first download the following downloads are going to be very fast (even for large images).
Images do not have to be used for boot disks exclusively, they also can be used for data that need to be made quickly available to a large set of VMs (In a scenario where a shared read-only disk doesn't satisfy the requirements for whatever reason)
Snapshot is a copy of your disk that you can use to create a new persistence disk (PD) of any type (standard PD or SSD PD). You can use the snapshot to create a bigger disk size, also you have the ability of creating the new disk on any zone you might need. Pricing is a bit cheaper for the provisioned space used for a snapshot. when used as backup, you can create differential snapshots.
When you use an existing disk to create an instance, you have to create the instance in the same zone where the disk exists and it will have the size of the disk.
When referring to images resources, is the pre-configured GCE operating system that you’re running (Centos, Debian, etc) and you can use the public images, available for all projects or private images for a specific project or create your own custom image.
A snapshot is locked within a project, but a custom image can be
shared between projects.
simply put - snapshot is basically the backup of the data in the disk
also important point is they are differentially backed up (lesser size).
used for backup and DR mostly.
Image is having backup of the OS as well , custom images are prepared to ensure some organizational policies as well.
In terms of cloud computing - Images are used to launch multiple instances with same configurations and snapshots are mostly for backup