How do I make a snapshot of my boot disk? - google-compute-engine

I've read multiple times that I can cause read/write errors if I create a snapshot. Is it possible to create a snapshot of the disk my machine is booted off of?

It depends on what you mean by "snapshot".
A snapshot is not a backup, it is a way of temporarily capturing the state of a system so you can make changes test the results and revert back to the previously known good state if the changes cause issues.
How to take a snapshot varies depending on the OS you're using, whether you're talking about a physical system or a virtual system, what virtualization platform, you're using, what image types you're using for disks within a given virtualization platform etc. etc. etc.
Once you have a snapshot, then you can make a real backup from the snapshot. You'll want to make sure that if it's a database server that you've flushed everything to disk and then write lock it for the time it takes to make the snapshot (typically seconds). For other systems you'll similarly need to address things in a way that ensures that you have a consistent state.
If you want to make a complete backup of your system drive, directly rather than via a snapshot then you want to shut down and boot off an alternate boot device like a CD or an external drive.
If you don't do that, and try to directly back up a running system then you will be leaving yourself open to all manner of potential issues. It might work some of the time, but you won't know until you try and restore it.
If you can provide more details about the system in question, then you'll get more detailed answers.
As far as moving apps and data to different drives, data is easy provided you can shut down whatever is accessing the data. If it's a database, stop the database, move the data files, tell the database server where to find its files and start it up.
For applications, it depends. Often it doesn't matter and it's fine to leave it on the system disk. It comes down to how it's being installed.
It looks like that works a little differently. The first snapshot will create an entire copy of the disk and subsequent snapshots will act like ordinary snapshots. This means it might take a bit longer to do the first snapshot.
According to :
this you ideally want to shut down the system before taking a snapshot of your boot disk. If you can't do that for whatever reason, then you want to minimize the amount of writes hitting the disk and then take the snapshot. Assuming you're using a journaling filesystem (ext3, ext4, xfs etc.) it should be able to recover without issue.

You an use the GCE APIs. Use the Disks:insert API to create the Persistence disk. you have some code examples on how to start an instance using Python, but Google has libraries for other programming languages like Java, PHP and other

Related

Google Compute Engine: what is the difference between disk snapshot and disk image?

I've been using both for my startup and to me, the functionality is the same. Until now, the instances I've been creating are only for computation. I'm wondering how GCE disk images and snapshots are different in terms of technology, and in which situation it is better to use one over the other.
A snapshot reflects the contents of a persistent disk in a concrete instant in time. An image is the same thing, but includes an operating system and boot loader and can be used to boot an instance.
Images and snapshots can be public or private. In the case of images, public can mean official public images provided by Google or not.
Snapshots are stored as diffs (a snapshot is stored relative to the previous one, though that is transparent to you) while images are not. They are also cheaper ($0.026 per GB/month vs $0.050 for images) (Snapshots are increasing to $0.050/GB/month on October 1, 2022).
These days the two concepts are quite similar. It's now possible to start an instance using a snapshot instead of an image, which is an easy way of resizing your boot partition. Using snapshots may be simpler for most cases.
Snapshots:
Good for backup and disaster recovery
Lower cost than images
Smaller size than images since it doesn't contain OS, etc.
Differential backups - only the data changed since the last snapshot
is recreated
Faster to create than images
Snapshots are only available in the project they are
created (now it is possible to share between projects)
Can be created for running disks even while they are attached
to running instances
Images:
Good for reusing compute engine instance states with new instances
Available across different projects
Can't be created for running instances(unless you use --force flag)
Snapshots are primarily targeting backup and disaster recovery scenarios, they are cheaper, easier to create (can often be uploaded without stopping the VM). They are meant for frequent regular upload, and rare downloads.
Images are primarily meant for boot disk creation. They optimized for multiple downloads of the same data over and over. If the same image downloaded many times, subsequent to the first download the following downloads are going to be very fast (even for large images).
Images do not have to be used for boot disks exclusively, they also can be used for data that need to be made quickly available to a large set of VMs (In a scenario where a shared read-only disk doesn't satisfy the requirements for whatever reason)
Snapshot is a copy of your disk that you can use to create a new persistence disk (PD) of any type (standard PD or SSD PD). You can use the snapshot to create a bigger disk size, also you have the ability of creating the new disk on any zone you might need. Pricing is a bit cheaper for the provisioned space used for a snapshot. when used as backup, you can create differential snapshots.
When you use an existing disk to create an instance, you have to create the instance in the same zone where the disk exists and it will have the size of the disk.
When referring to images resources, is the pre-configured GCE operating system that you’re running (Centos, Debian, etc) and you can use the public images, available for all projects or private images for a specific project or create your own custom image.
A snapshot is locked within a project, but a custom image can be
shared between projects.
simply put - snapshot is basically the backup of the data in the disk
also important point is they are differentially backed up (lesser size).
used for backup and DR mostly.
Image is having backup of the OS as well , custom images are prepared to ensure some organizational policies as well.
In terms of cloud computing - Images are used to launch multiple instances with same configurations and snapshots are mostly for backup

economical way of scaling a php+mysql website

My partner and I are trying to start a website hosted in cloud. It has pretty heavy ajax traffic and the backend handles money transactions so we need ACID in some of the DB tables.
Currently everything is running off a single server. Some of the AJAX traffic are cached in text files.
Question:
What's the best way to scale the database server? I thought about moving mysql to separate instances and do master-master duplication. However this seems tough and I heard I might lose ACID properties even with InnoDB? Is Amazon RDS a good solution?
The web server is relatively stateless except for some custom log files and the ajax cache files. What's a good way to scale to multiple web servers? I guess the custom log files can be moved to a reliable shared file system or DB but not sure what to do about the AJAX cache file coherency across multiple servers. (I dont care about losing /var/log/* if web server dies)
For performance it might be cheaper to go with larger instance with more cores and memory but eventually I would need redundancy so wondering what's the best way to do this cheaply.
thanks
take a look at this post. there is plenty of presentations on the net discussing scalability. few things i suggest to keep in mind:
plan early for the data sharding [even if you are not going to do it immediately]
try using mechanisms like memcached to limit number of queries sent to the database
prepare to serve static content from other domain, in the longer run - from ngin-x-alike server and later CDN
redundancy - depends on your needs. is 'read-only' mode acceptable for your site? if so - go with mysql replication + rsync of static files and in case of failover have your site work in that mode till you recover the master node. if you need high availability - then take a look either at drbd replication [at least for mysql] or setup with automated promotion of slave server to become master node.
you might find following interesting:
http://yoshinorimatsunobu.blogspot.com/2011/08/mysql-mha-support-for-multi-master.html
http://mysqlperformanceblog.com
http://highscalability.com
http://google.com - search for scalability, lamp, failover... there are tones of case studies and horror stories from the trench lines :-]
Another option is using a scaleable platform such as Amazon Web Services. You can start out with a micro instance and configure load balancing to fire up more instances as needed.
Once you determine average resource requirements you can then resize your image to larger or smaller depending on your needs.
http://aws.amazon.com
http://tuts.pinehead.tv/2011/06/26/creating-an-amazon-ec2-instance-with-linux-lamp-stack/
http://tuts.pinehead.tv/2011/09/11/how-to-use-amazon-rds-relation-database-service-to-host-mysql/
Amazon allows you to either load balance or change instance size based off demand.

Storing image files in Mongo database, is it a good idea?

When working with mysql, it is a bad idea to store images as BLOB in the database, as it makes the database quite large which is harmful for normal usage of the database. Then, it is better to save image files on disk and save link to them within the database.
However, I think this is different for MongoDB, as increasing the database file size has a negligible influence on performance (this is the reason that MongoDB can successfully handle billions of records).
Do you think it is better to save image files on MongoDB (as GridFS) to reduce number of files stored on the server; or still it is better to keep the database as small as possible?
The problem isn't so much that the database gets big, databases can handle that (although MongoDB isn't as good as many other in that respect). The problem is that to send the data to the client it first has to be moved into RAM by the database, then copied over to the application's memory, then handed off to the kernel to be sent through the socket. It's wasting lots of RAM and CPU cycles. The reason it's better to have large files in the filesystem is that it's easier to get around copying it, you can ask the kernel to stream the file from disk to the socket directly.
The downside of storing large files in the filesystem is that it's much harder to distribute. Using a database, and something like Mongo's GridFS makes it possible to scale out. You just have to make sure you don't copy the whole file into the application's memory at once, but a chunk at a time. Most web app frameworks have some support for sending chunked HTTP responses nowadays.
The answer is yes. Back in the old cave-man days, servers had mutable file systems you could change. This was great till we tried to scale things.
Cave-people nowadays build apps with immutable deployments. Heroku and Dokku are examples of this. Because the web app server has no state, they can be created, upgraded, scaled, and destroyed easily.
Since we still have files, we need to put them somewhere. There are several solutions: nfs, our database, someone elses database.
nfs is a 'network file system' which let's you do file i/o on network resources. If you're dealing with the network anyways, IMHO it doesn't add much value unless it's what you know already.
Our database - For MongoDB there are two options: (file > 16mb) ? GridFS : BinData
Someone elses database - Some are basic like Amazon S3 and some offer extra services like Cloudinary or Dropbox.
If you're on an big-budget enterprise team and someone spends 40 hrs a week taking care of servers then sure - use the file system. If you're building web apps that scale, putting files in the DB makes sense.
If you're concerned about performance:
1) Using a proxy (e.g. nginx) or a CDN to host your content for clients. Your server should just be serving cache misses.
2) Use streaming IO Nodeschool has a cool tutorial for Node.js.
Storing images is not a good idea in any DB, because:
read/write to a DB is always slower than a filesystem
your DB backups grow to be huge and more time consuming
access to the files now requires going through your app and DB layers
The last two are the real killers.
Source: Three things you should never put in your database.
So if you can make your application crafty, then better not to upload your pictures to MongoDB.
However, if you are close to deadline... and the database will be so small that it will not grow up a lot and its size will never exceed the available RAM on the machine running your application, then I think (as opposed to the author of the cited article), you may consider storing the images in MongoDB. It's simply, convenient, quick to implement and gives you some flexibility.
MongoDB's GridFS is designed for this sort of storage and is quite handy for storing image files across many different servers in a way that all servers can use them.

Sharing storage between servers

I have a PHP based web application which is currently only using one webserver but will shortly be scaling up to another. In most regards this is pretty straightforward, but the application also stores a lot of files on the filesystem. It seems that there are many approaches to sharing the files between the two servers, from the very simple to the reasonably complex.
These are the options that I'm aware of
Simple network storage
NFS
SMB/CIFS
Clustered filesystems
Lustre
GFS/GFS2
GlusterFS
Hadoop DFS
MogileFS
What I want is for a file uploaded via one webserver be immediately available if accessed through the other. The data is extremely important and absolutely cannot be lost, so whatever is implemented needs to a) never lose data and b) have very high availability (as good as, or better, than a local filesystem).
It seems like the clustered filesystems will also provide faster data access than local storage (for large files) but that isn't of vita importance at the moment.
What would you recommend? Do you have any suggestions to add or anything specifically to look out for with the above options? Any suggestions on how to manage backup of data on the clustered filesystems?
You can look at the Mirror File System that replicate files between servers in real time.
It's very easy to install and set up. One mount command does it and you can have a HA,
Load Balancing and Backup solution in less than 10 minutes.
http://www.TwinPeakSoft.com/
Fish.Ada
It looks like the clustered filesystems are the best bet. Backup can be done as for any other filesystem, although with most of them having built in redundancy, they are already more reliable than a standard filesystem.

Full complete MySQL database replication? Ideas? What do people do?

Currently I have two Linux servers running MySQL, one sitting on a rack right next to me under a 10 Mbit/s upload pipe (main server) and another some couple of miles away on a 3 Mbit/s upload pipe (mirror).
I want to be able to replicate data on both servers continuously, but have run into several roadblocks. One of them being, under MySQL master/slave configurations, every now and then, some statements drop (!), meaning; some people logging on to the mirror URL don't see data that I know is on the main server and vice versa. Let's say this happens on a meaningful block of data once every month, so I can live with it and assume it's a "lost packet" issue (i.e., god knows, but we'll compensate).
The other most important (and annoying) recurring issue is that, when for some reason we do a major upload or update (or reboot) on one end and have to sever the link, then LOAD DATA FROM MASTER doesn't work and I have to manually dump on one end and upload on the other, quite a task nowadays moving some .5 TB worth of data.
Is there software for this? I know MySQL (the "corporation") offers this as a VERY expensive service (full database replication). What do people out there do? The way it's structured, we run an automatic failover where if one server is not up, then the main URL just resolves to the other server.
We at Percona offer free tools to detect discrepancies between master and server, and to get them back in sync by re-applying minimal changes.
pt-table-checksum
pt-table-sync
GoldenGate is a very good solution, but probably as expensive as the MySQL replicator.
It basically tails the journal, and applies changes based on what's committed. They support bi-directional replication (a hard task), and replication between heterogenous systems.
Since they work by processing the journal file, they can do large-scale distributed replication without affecting performance on the source machine(s).
I have never seen dropped statements but there is a bug where network problems could cause relay log corruption. Make sure you dont run mysql without this fix.
Documented in the 5.0.56, 5.1.24, and 6.0.5 changelogs as follows:
Network timeouts between the master and the slave could result
in corruption of the relay log.
http://bugs.mysql.com/bug.php?id=26489