Primary Disk vs Swap Disk [closed] - google-compute-engine

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
I am using Google Compute Engine with 90GB of SSD. As my site is growing, cost has also shooted up. I tried shifting to https://www.vpb.com but they gave me
30GB Primary Disk and 60GB Swap Disk (Both are SSDs as they said).
The proposed cost has also decreased to 50%. My RAM is just 8GB.
Is above configuration different from 90GB SSD disk in Google Compute Engine?

Is above configuration different from 90GB SSD disk in Google Compute Engine?
Yes. Google Compute Engine is a full-featured IaaS platform where you can create VMs with the disks (and sizes) you need. The Persistent Disk is designed to be reliable, allows for easy snapshots, and you can also resize them while the VM is running.
This other server might be giving you 2 different disks for their VM or dedicated machine and you will have to design your site to use them both. Swap disks are really only meant for temporary work and it's strange to see them being offered separately like that. They also might be attached to the machine rather than reliable storage like GCP's persistent disks.
If 90GB isn't enough on your GCP VM, how will 30+60 be enough in this other machine? Are you uploading large media files? You might be better served by using Cloud Storage or S3 for those files.

As mentioned above - there are 2 important things to understand:
SWAP + disk is not the same as a big disk. SWAP is basically cheap RAM, in case you're running low on it. If you have 60+GB of static data on your VM, 30GB disk is more than twice as small as your minimum.
Using Disk for storing static data (e.g. images may be served from storage instances, that are way cheaper).

Disclosure: I am a product manager on Google Cloud Platform (but not Google Compute Engine or Persistent Disk specifically).
30GB Primary Disk and 60GB Swap Disk (Both are SSDs as they said).
The proposed cost has also decreased to 50%. My RAM is just 8GB.
Is above configuration different from 90GB SSD disk in Google Compute Engine?
Note that a "disk in a machine" is very different from a Google Compute Engine persistent disk:
A "disk in a machine" is exactly that: a single physical device. If it fails, you are expected to have made a backup of it prior to failure. How you make the backup is up to you.
A Google Compute Engine persistent disk is a replicated disk, so a single disk failure will not cause you to lose data. You can make backups (snapshots) of your persistent disk, and it's highly recommended, and you can use Google Cloud Storage for this purpose, but it's typically used for protecting against application bugs, not persistent disk durability.
As another answer says, GCE persistent disk also has a live resize capability so that you can easily increase the size of it if needed.
Google Cloud Platform has many more services besides just VMs: databases, key-value storage, object/blob storage, etc. so there's more to consider when making your decision.

Related

Share a SSD or RAMdisk between Google Compute Engine VMs

From google documentation it is clear that read-only Persistent Disk(PD) can be shared between multiple instances (Google compute VMs), but is it somehow possible to share SSD or RAM disk with multiple VMs?
Local SSD are physically attached and they are, as well as RAM, not read-only.
So this question probably answers itself.

How can I tell the raw size of a MySQL DB snapshots in Amazon RDS?

When I launch an Amazon MySQL database instance in RDS, I choose the amount of Allocated Storage for it.
When I create a snapshot (either manually or with the automatic backup), it says under "Storage" the same size as the size allocated for the instance, even though my database did not reach that size.
Since the pricing (or the free tier) in Amazon is dependent on the amount of storage used, I would like to know the real storage size I'm using, rather than the size allocated by the original database.
From looking at the Account Activity, and from knowing how mysqldump works, I would guess the snapshot does not really include the empty space allocated.
I was interested in the answer to this question and a google search brought me here. I was surprised to see that although there is an accepted, upvoted answer, it does not actually answer the question that was asked.
The question asked is:
How can I tell the raw size of a MySQL DB snapshots in Amazon RDS?
However, the accepted answer is actually the answer to this question:
Am I charged for the allocated size of the source database when I take an RDS snapshot from it.
As to the the original question, AFAICT, there is no API or console function to determine the storage used by an RDS snapshot. The DBSnapshot resource has allocated_storage (ruby, java), but this returns the storage maximum size requested when the database was created. This mirrors the AWS RDS console:
One might have thought this would be broken out on the AWS bill, but it provides very little details. For RDS:
The S3 part of the bill is even less helpful:
Conclusion, there is no way to tell the raw size of a MySQL DB snapshot in Amazon RDS.
RDS is being stored through EBS according to FAQ:
Amazon RDS uses EBS volumes for database and log storage.
EBS doesn't store empty blocks, according to its pricing page:
Because data is compressed before being saved to Amazon S3, and Amazon EBS does not save empty blocks, it is likely that the snapshot size will be considerably less than your volume size.
And takes space only for changed blocks after initial snapshot was made, according to details page:
If you have a device with 100 GB of data but only 5 GB has changed after your last snapshot, a subsequent snapshot consumes only 5 additional GB and you are billed only for the additional 5 GB of snapshot storage, even though both the earlier and later snapshots appear complete.
RDS backups are block level full virtual machine snapshots; no mysqldump at all is involved. Given this fact, each of your snapshots will use exactly the same ammount of storage as your production instance at the moment the backup took place.

Storage options for diskless servers [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am trying to build a neural network simulation running on several high-CPU diskless instances. I am planning to use a persistent disk to store my simulation code and training data and mount them on all server instances. It is basically a map reduce kind of task (several nodes working on the same training data, the results of all nodes need to be collected to one single results file).
My only question now is, what are my options to (permanently) save the simulation results of the different servers (either at some points during the simulation or once at the end). Ideally, I would love to write them to the single persistent disk mounted on all servers but this is not possible because i can only mount it read-only to more than one server.
What is the smartest (and cheapest) way to collect all simulation results of all servers back to one persistent disk?
Google Cloud Storage is a great way to permanently store information in the Google Cloud. All you need to do is enable that product for your project, and you'll be able to access Cloud Storage directly from your Compute Engine virtual machines. If you create your instances with the 'storage-rw' service account, access is even easier because you can use the gsutil command built into your virtual machines without needing to do any explicit authorization.
To be more specific, go to the Google Cloud Console, select the project with which you'd like to use Compute Engine and Cloud Storage and make sure both those services are enabled. Then use the 'storage-rw' service account scope when creating your virtual machine. If you use gcutil to create your VM, simply add the --storage_account_scope=storage-rw (there's also an intuitive way to set the service account scope if you're using the Cloud Console to start your VM). Once your VM is up and running you can use the gsutil command freely without worrying about doing interactive login or OAuth steps. You can also script your usage by integrating any desired gsutil requests into your application (gsutil will also work in a startup script).
More background on the service account features of GCE can be found here.
Marc's answer is definitely best for long-term storage of results. Depending on your I/O and reliability needs, you can also set up one server as an NFS server, and use it to mount the volume remotely on your other nodes.
Typically, the NFS server would be your "master node", and it can serve both binaries and configuration. Workers would periodically re-scan the directories exported from the master to pick up new binaries or configuration. If you don't need a lot of disk I/O (you mentioned neural simulation, so I'm presuming the data set fits in memory, and you only output final results), it can be acceptably fast to simply write your output to NFS directories on your master node, and then have the master node backup results to some place like GCS.
The main advantage of using NFS over GCS is that NFS offers familiar filesystem semantics, which can help if you're using third-party software that expects to read files off filesystems. It's pretty easy to sync down files from GCS to local storage periodically, but does require running an extra agent on the host.
The disadvantages of setting up NFS are that you probably need to sync UIDs between hosts, NFS can be a security hole, (I'd only expose NFS on my private network, not to anything outside 10/8) and that it will require installing additional packages on both client and server to set up the shares. Also, NFS will only be as reliable as the hosting machine, while an object store like GCS or S3 will be implemented with redundant servers and possibly even geographic diversity.
If you want to stay in the google product space, how about google cloud storage?
Otherwise, I've used S3 and boto for these kinds of tasks
As a more general option, you're asking for some sort of general object store. Google, as noted in previous responses, makes a nice package, but nearly all cloud providers provide some storage option. Make sure your cloud provider has BOTH key options -- a volume store, a store for data similar to a virtual disk, and an object store, a key/value store. Both have their strengths and weaknesses. Volume stores are drop-in replacements for virtual disks. If you can use stdio, you can likely use a remote volume store. The problem is, they often have the structure of a disk. If you want anything more than that, you're asking for a database. The object store is a "middle ground" between the disk and the database. It's fast, and semi-structured.
I'm an OpenStack user myself -- first, because it does provide both storage families, and second, it's supported by a variety of vendors, so, if you decide to move from vendor A to vendor B, your code can remain unchanged. You can even run a copy of it on your own machines (Go to www.openstack.org) Note however, OpenStack does like memory. You're not going to run your private cloud on a 4GB laptop! Consider two 16GB machines.

Performance effects of moving mysql db to another Amazon EC2 instance

We have an EC2 running both apache and mysql at the moment. I am wondering if moving the mysql to another EC2 instance will increase or decrease the performance of the site. I am more worried about the network speed issues between the two instances.
EC2 instances in the same availability zone are connected via a 10,000 Mbps network - that's faster than a good solid state drive on a SATA-3 interface (6Gb/s)
You won't see any performance drop by moving a database to another server, in fact you'll probably see a performance increase because of having separate memory and cpu cores for the two servers.
If your worry is network latency then forget about it - not a problem on AWS in the same availability zone.
Another consideration is that you're probably storing your website & db file on an EBS mounted volume. That EBS block is stored off-instance so you're actually storing a storage array on the same super-fast 10Gbps network.
So what I'm saying is... with EBS your website and database are already talking across the network to get their data, putting them on seperate instances won't really change anything in that respect - besides giving more resources to both servers. More resources means more data stored locally in memory and more performance.
The answer depends largely on what resources apache and MySQL are using. They can happily co-habit if demands on your website are low, and each are configured with enough memory that they don't shell out to virtual memory. In this instance, they are best kept together.
As traffic grows, or your application grows, you will benefit from splitting them out because they can then both run inside dedicated memory. Provided that the instances are in the same region then you should see fast performance between them. I have even run a web application in Europe with the DB in USA and performance wasn't noticeably bad! I wouldn't recommend that though!
Because AWS is easy and cheap, your best bet is to set it up and benchmark it!

What are the technical considerations for running MSSQL and MySQL off the same Windows server?

Our application runs off MySQL, but a client has another application that requires SQL. We want to run them both on the same box. What are some technical considerations for this configuration... is this really just a matter of making sure there are no port conflicts?
They will also contend for the hard drive, memory and cpu usage. The first two more then the last. If they are both actively used, they will both want to write to/read from the disk. You may want to consider putting each on it's own drive if they are heavily used.
Test it out and see how the performance fares. For more details open performance monitor and take a look at the disk and memory stats.