Shrink a Dataproc worker boot disk - google-compute-engine

Due to some mix-up during planning we ended up with several worker nodes running 23TB drives which are now almost completely unused (we keep data on external storage). As the drives are only wasting money at the moment, we need to shrink them to a reasonable size.
Using weresync I was able to fully clone the drive to a much smaller one but apparently you can't swap the boot drive in GCE (which makes no sense to me). Is there a way to achieve that or do I need to create new workers using the images? If so, is there any other config I need to copy to the new instance in order for it to be automatically joined to the cluster?

Dataproc does not support VMs configuration changes in running clusters.
I would advise you to delete old cluster and create new one with workers disk size that you need.

I ended up creating a ticket with GCP support - https://issuetracker.google.com/issues/120865687 - to get an official answer to that question. Got an that this is not possible currently but should be available shortly (within months) in the beta GCP CLI, possibly in the Console on a later data as well.
Went on with a complete rebuild of the cluster.

Related

1gb database with only two records

I identified an issue with an infrastructure I created on the Google Cloud Platform and would like to ask the community for help.
A charge was made that I did not expect, as theoretically it would be almost impossible for me to pass the limits of the free tier. But I noticed that my database is huge, with 1gb and growing, and there are some very heavy buckets.
My database is managed by a Django APP, and accessing the tool's admin panel there are only 2 records in production. There were several backups and things like that, but it wasn't me.
Can anyone give me a guide on how to solve this?
I would assume that you manage the database yourself, i.e. it is not Cloud SQL. Databases pre-allocate files in larger chunks in order to be efficient on writes. You can check this yourself - write additional 100k records, most likely size will not change.
Go to Cloud SQL and it will say how many sql instances you have. If you see the "create instance" window that means that you don't have any Google managed SQL instances and the one we're talking about is a standalone one.
You can also check Deployment Manger to see if you deployed one from the marketplace or it's a standalone VM with MySQL installed manually.
I made a simple experiment and deployed (from Marketplace) a MySQL 5.6 on top of Ubuntu 14.04 LTS. Initial DB size was over 110MB - but there are some records in it initially (schema, permissions etc) so it's not "empty" at all.
It is still weird that your DB is 1GB already. Try to deploy new one (if your use case permits that). Maybe this is some old DB that was used for some purpose and all the content deleted afterwards but some "trash" still remains and takes a lot of space.
Or - it may well be exactly as NeatNerd said - that DB will allocate some space due to performance issues and optimisation.
If I have more details I can give you better explanation.

Kubernetes cluster on GCE from Instances/Group

Have Kubernetes computation cluster running on GCE, reasonable happy so far. I know if I created K-cluster, I'll get to see nodes as VM Instances and cluster as Instance group. I would like to do other way around - create instances/group and make K-cluster out of it so it could be managed by Kubernetes. Reason I want to do so is to try and make nodes preemptible, which might better fit my workload.
So question - Kubernetes cluster with preemptible nodes how-to. I could do either one or another now, but not together
There is a patch out for review at the moment (#12384) that makes a configuration option to mark the nodes in the instance group as preemptible. If you are willing to build from head, this should be available as a configuration option in the next couple of days. In the meantime, you can see from the patch how easy it is to modify the GCE startup scripts to make your VMs preemptible.

How do I make a snapshot of my boot disk?

I've read multiple times that I can cause read/write errors if I create a snapshot. Is it possible to create a snapshot of the disk my machine is booted off of?
It depends on what you mean by "snapshot".
A snapshot is not a backup, it is a way of temporarily capturing the state of a system so you can make changes test the results and revert back to the previously known good state if the changes cause issues.
How to take a snapshot varies depending on the OS you're using, whether you're talking about a physical system or a virtual system, what virtualization platform, you're using, what image types you're using for disks within a given virtualization platform etc. etc. etc.
Once you have a snapshot, then you can make a real backup from the snapshot. You'll want to make sure that if it's a database server that you've flushed everything to disk and then write lock it for the time it takes to make the snapshot (typically seconds). For other systems you'll similarly need to address things in a way that ensures that you have a consistent state.
If you want to make a complete backup of your system drive, directly rather than via a snapshot then you want to shut down and boot off an alternate boot device like a CD or an external drive.
If you don't do that, and try to directly back up a running system then you will be leaving yourself open to all manner of potential issues. It might work some of the time, but you won't know until you try and restore it.
If you can provide more details about the system in question, then you'll get more detailed answers.
As far as moving apps and data to different drives, data is easy provided you can shut down whatever is accessing the data. If it's a database, stop the database, move the data files, tell the database server where to find its files and start it up.
For applications, it depends. Often it doesn't matter and it's fine to leave it on the system disk. It comes down to how it's being installed.
It looks like that works a little differently. The first snapshot will create an entire copy of the disk and subsequent snapshots will act like ordinary snapshots. This means it might take a bit longer to do the first snapshot.
According to :
this you ideally want to shut down the system before taking a snapshot of your boot disk. If you can't do that for whatever reason, then you want to minimize the amount of writes hitting the disk and then take the snapshot. Assuming you're using a journaling filesystem (ext3, ext4, xfs etc.) it should be able to recover without issue.
You an use the GCE APIs. Use the Disks:insert API to create the Persistence disk. you have some code examples on how to start an instance using Python, but Google has libraries for other programming languages like Java, PHP and other

Google Compute Engine: what is the difference between disk snapshot and disk image?

I've been using both for my startup and to me, the functionality is the same. Until now, the instances I've been creating are only for computation. I'm wondering how GCE disk images and snapshots are different in terms of technology, and in which situation it is better to use one over the other.
A snapshot reflects the contents of a persistent disk in a concrete instant in time. An image is the same thing, but includes an operating system and boot loader and can be used to boot an instance.
Images and snapshots can be public or private. In the case of images, public can mean official public images provided by Google or not.
Snapshots are stored as diffs (a snapshot is stored relative to the previous one, though that is transparent to you) while images are not. They are also cheaper ($0.026 per GB/month vs $0.050 for images) (Snapshots are increasing to $0.050/GB/month on October 1, 2022).
These days the two concepts are quite similar. It's now possible to start an instance using a snapshot instead of an image, which is an easy way of resizing your boot partition. Using snapshots may be simpler for most cases.
Snapshots:
Good for backup and disaster recovery
Lower cost than images
Smaller size than images since it doesn't contain OS, etc.
Differential backups - only the data changed since the last snapshot
is recreated
Faster to create than images
Snapshots are only available in the project they are
created (now it is possible to share between projects)
Can be created for running disks even while they are attached
to running instances
Images:
Good for reusing compute engine instance states with new instances
Available across different projects
Can't be created for running instances(unless you use --force flag)
Snapshots are primarily targeting backup and disaster recovery scenarios, they are cheaper, easier to create (can often be uploaded without stopping the VM). They are meant for frequent regular upload, and rare downloads.
Images are primarily meant for boot disk creation. They optimized for multiple downloads of the same data over and over. If the same image downloaded many times, subsequent to the first download the following downloads are going to be very fast (even for large images).
Images do not have to be used for boot disks exclusively, they also can be used for data that need to be made quickly available to a large set of VMs (In a scenario where a shared read-only disk doesn't satisfy the requirements for whatever reason)
Snapshot is a copy of your disk that you can use to create a new persistence disk (PD) of any type (standard PD or SSD PD). You can use the snapshot to create a bigger disk size, also you have the ability of creating the new disk on any zone you might need. Pricing is a bit cheaper for the provisioned space used for a snapshot. when used as backup, you can create differential snapshots.
When you use an existing disk to create an instance, you have to create the instance in the same zone where the disk exists and it will have the size of the disk.
When referring to images resources, is the pre-configured GCE operating system that you’re running (Centos, Debian, etc) and you can use the public images, available for all projects or private images for a specific project or create your own custom image.
A snapshot is locked within a project, but a custom image can be
shared between projects.
simply put - snapshot is basically the backup of the data in the disk
also important point is they are differentially backed up (lesser size).
used for backup and DR mostly.
Image is having backup of the OS as well , custom images are prepared to ensure some organizational policies as well.
In terms of cloud computing - Images are used to launch multiple instances with same configurations and snapshots are mostly for backup

economical way of scaling a php+mysql website

My partner and I are trying to start a website hosted in cloud. It has pretty heavy ajax traffic and the backend handles money transactions so we need ACID in some of the DB tables.
Currently everything is running off a single server. Some of the AJAX traffic are cached in text files.
Question:
What's the best way to scale the database server? I thought about moving mysql to separate instances and do master-master duplication. However this seems tough and I heard I might lose ACID properties even with InnoDB? Is Amazon RDS a good solution?
The web server is relatively stateless except for some custom log files and the ajax cache files. What's a good way to scale to multiple web servers? I guess the custom log files can be moved to a reliable shared file system or DB but not sure what to do about the AJAX cache file coherency across multiple servers. (I dont care about losing /var/log/* if web server dies)
For performance it might be cheaper to go with larger instance with more cores and memory but eventually I would need redundancy so wondering what's the best way to do this cheaply.
thanks
take a look at this post. there is plenty of presentations on the net discussing scalability. few things i suggest to keep in mind:
plan early for the data sharding [even if you are not going to do it immediately]
try using mechanisms like memcached to limit number of queries sent to the database
prepare to serve static content from other domain, in the longer run - from ngin-x-alike server and later CDN
redundancy - depends on your needs. is 'read-only' mode acceptable for your site? if so - go with mysql replication + rsync of static files and in case of failover have your site work in that mode till you recover the master node. if you need high availability - then take a look either at drbd replication [at least for mysql] or setup with automated promotion of slave server to become master node.
you might find following interesting:
http://yoshinorimatsunobu.blogspot.com/2011/08/mysql-mha-support-for-multi-master.html
http://mysqlperformanceblog.com
http://highscalability.com
http://google.com - search for scalability, lamp, failover... there are tones of case studies and horror stories from the trench lines :-]
Another option is using a scaleable platform such as Amazon Web Services. You can start out with a micro instance and configure load balancing to fire up more instances as needed.
Once you determine average resource requirements you can then resize your image to larger or smaller depending on your needs.
http://aws.amazon.com
http://tuts.pinehead.tv/2011/06/26/creating-an-amazon-ec2-instance-with-linux-lamp-stack/
http://tuts.pinehead.tv/2011/09/11/how-to-use-amazon-rds-relation-database-service-to-host-mysql/
Amazon allows you to either load balance or change instance size based off demand.