We are a google cloud sql user.
We are having a table in cloud sql , the size of table is approximately 400 GB.
The maximum size of instance is 500 GB.
The table size will grow up to 2 TB by end of this year [Just an estimation].
We want to create multiple instances to handle this huge table.
Can we allocate more instances for this table?
Please suggest us.
I'll leave sharding strategies to other answers, I just want to provide an alternative. Google Cloud Platform has some other solutions that might help you scale:
Datastore is great for denormalised application storage with fast access.
BigQuery is great for advanced offline analytics.
If you are willing to manage your own database you can run MySQL in GCE with any size disk supported by GCE.
Related
I'm starting a project where a CloudSQL instance would be a great fit however I've noticed they are twice the price for the same specification VM on GCP.
I've been told by several devops guys I work with that they are billed by usage only. Which would be perfect for me. However on their pricing page it states "Instance pricing for MySQL is charged for every second that the instance is running".
https://cloud.google.com/sql/pricing#2nd-gen-pricing
I also see several people around the web saying they are usage only.
Cloud SQL or VM Instance to host MySQL Database
Am I interpreting Googles pricing pages incorrectly?
Am I going to be billed for the instance being on or for its usage?
Billed by usage
All depend what you mean by USAGE. When you run a Cloud SQL instance, it's like a server (compute engine). Until you stop it, you will pay for it. It's not a pay-per-request pricing, as you can have with BigQuery.
With Cloud SQL, you will also pay the storage that you use. And the storage can grow automatically according with the usage. Be careful the storage can't be reduce!! even if you delete data in database!
Price is twice a similar Compute engine
True! A compute engine standard1-n1 is about $20 per month and a same config on Cloud SQL is about $45.
BUT, what about the price of the management of your own SQL instance?
You have to update/patch the OS
You have to update/patch the DB engine (MySQL or Postgres)
You have to manage the security/network access
You have to perform snapshots, ensure that the restoration works
You have to ensure the High Availability (people on call in case of server issue)
You have to tune the Database parameters
You have to watch to your storage and to increase it in case of needs
You have to set up manually your replicas
Is it worth twice the price? For me, yes. All depends of your skills and your opinion.
There are a lot of hidden configuration options that when modified can quickly halve your costs per option.
Practically speaking, GCP's SQL product only works by running 24/7, there is no time-based 'by usage' option, short of you manually stopping and restarting the compute engine.
There are a lot of tricks you can follow to lower costs, you can read many of them here: https://medium.com/#the-bumbling-developer/can-you-use-google-cloud-platform-gcp-cheaply-and-safely-86284e04b332
There are only 7 performance tiers in GCS (D0, D1, D2, D4, D8, D16, D32), RAM maxes out at 16GB (D32) as they are based on Google Compute Engine(GCE) machine types. See screenshot below (1)
By comparison, Amazon has 13 performance tiers with db.r3.8xlarge's RAM maxes out at 244GB. (2)
So my question is, what is the rough equivalent performance tier in AWS RDS for MySQL for a Google Cloud SQL's D32 tier?
Disclaimer: I am new to Google Cloud SQL. I only start to use Cloud SQL because I started a new job that's 100% Google Cloud. Previously I have been a AWS user since the early days.
The D0-D32 Cloud SQL tiers are not based on GCE VMs so a direct comparison is not straightforward. Note that the storage for D0-D32 is replicated geographically and that makes writes a lot slower. The ASYNC mode improves the performance for small commits. The upside is that the instances can be relocated quickly between location that are farther apart.
The connectivity for Cloud SQL is also different from RDS. RDS can be access using IPs and the latency is comparable with VMs talking over local IPs. Cloud SQL uses only external IPs. That makes the latency from GCE higher (~1.25ms) but it provides a slightly better for experience for connections coming from outside the Google Cloud because the TCP connections are terminated closer to the clients.
That being said, from a memory point of view, the db.m3.xlarge from RDS is the closest match for the D32 from Cloud SQL. If the working set fits in the memory the performance for some queries will be similar.
Currently in Alpha, there is a new feature of Cloud SQL that uses comparable performance to GCE machine types.
A Google Cloud SQL Performance Class instance is a new Google Cloud
SQL instance type that provides better performance and more storage
capacity, and is optimized for heavy workloads running on Google
Compute Engine. The new instance type is an always-on, long-lived
database as compared to a Standard Class instance type. Performance
Class instances are based on tiers that correspond to Google Compute
Engine (GCE) standard or highmem machine types.
Link: https://cloud.google.com/sql/docs/getting-started-performance-class
Anyway, very good question, comparing prices with AWS, I found out that there is a huge difference in resources for the smallest instances and the same price:
GCE, D0 = $0.025 per hour (0,128 GB RAM + "an appropriate amount of CPU")
AWS, db.t2.micro = $0.02 per hour (1 GB RAM + 1 vCPU)
For 1 GB RAM in GCE, one would have to pay $0.19 per hour. Unfortunately Google does not specify anything about SSD storage, something very important for performance comparison.
Not sure how to ask this question, but as I understand google cloud SQL supports the idea of instances, which are located throughout their global infrastructure...so I can have a single database spread across multiple instances all over the world.
I have a a few geographic regions our app serves...the data doesn't really need to be aggregated as a whole and could be stored individually on separate databases in regions accordingly.
Does it make sense to serve all regions off one database/multiple instances? Or should I segregate each region into it's own database and host the data the old fashion way?
If by “scaling” you mean memory size, then you can start with a smaller instance (less RAM) and move up to a more powerful instance (more RAM) later.
But if you mean more operations per second, there is a certain max size and max number of operations that one Cloud SQL instance can support. You cannot infinitely scale one instance. Internally the data for one instance is indeed stored over multiples machines, but that is more related to reliability and durability, and it does not scale the throughput beyond a certain limit.
If you really need more throughput than what one Cloud SQL instance can provide, and you do need a SQL based storage, you’ll have to use multiple instances (i.e. completely separate databases) but your app will have to manage them.
Note that the advantages of Cloud go beyond just scalability. Cloud SQL instances are managed for you (e.g. failover, backups, etc. are taken care of). And you get billing based on usage.
(Cloud SQL team)
First, regarding the overall architecture: An "instance" in Google Cloud SQL is essentially one MySQL database server. There is no concept of "one database/multiple instances". Think of your Cloud SQL "instance" as the database itself. At any point in time, the data from a Cloud SQL instance is served out from one location -- namely where your instance happens to be running at that time. Now, if your app is running in Google App Engine or Google Compute Engine, then you can configure your Cloud SQL instance so that it is located close to your app.
Regarding your question of one database vs. multiple databases: If your database is logically one database and is served by one logical app, then you should probably have one Cloud SQL instance. (Again, think of one Cloud SQL instance as one database). If you create multiple Cloud SQL instances, they will be siloed from one another, and your app will have to do all the complex logic of managing them as completely different databases.
(Google Cloud SQL team)
I've read that Google Cloud SQL is recommended for small to medium sized applications. I was wondering if it's possible to spread my data across multiple instances in Google Cloud SQL. Say in instance 1 I have 10 tables, 1Gb each, and after a while table A needs more space, say 1.5Gb. Now there's not enough space for all this data in one single instance, how do you spread table A data across different instances? Is it possible to do so?
Thank you,
Rodrigo.
As per the google storage documentation:
If you reach the free storage limit, everything in Google Drive, Gmail and Picasa will still be accessible, but you won't be able to create or add anything new over the free storage limit.
In a scenario with a database containing hundreds of millions of rows and reaching sizes of 500GB with maybe ~20 users. Mostly it's data storage for aggregated data to be reported on later.
Would SQL Azure be able to handle this scenario? If so, does it make sense to go that route? Compared to purchasing and housing 2+ high end servers ($15k-$20k each) in a co-location facility + all maintenance and backups.
Did you consider using the Azure Table storage? Azure Tables do not have referential integrity, but if you are simply storing many rows, is that an option for you? You could use SQL Azure for your transactional needs, and use Azure Tables for those tables that do not fit in SQL Azure. Also, Azure Tables will be cheaper.
SQL Azure databases are limited to 50Gb (at the moment)
As described in the General Guidelines and Limitations
I don't know whether SQL Azure is able to handle your scenario - 500GB seems a lot and does not figure in the pricing list (50GB max). I'm just trying to give perspective about the pricing.
Official pricing of SQL Azure is around 10$ a GB/month ( http://www.microsoft.com/windowsazure/pricing/)
Therefore, 500 GB would be around 5k $ each month roughly. 2 high-end servers (without license fees, maintenance and backups) of 20k take about 8 months to pay off.
Or, from an other point of view: Assuming you change your servers every 4 years, does the budget of 240k $ (5k $ * 48 months) cover the hardware, installation/configuration, licence fees and maintenance costs? (Not counting bandwidth and backup since you'll pay that extra too when using SQL Azure).
One option would be to use SQL Azure sharding. This is a way to spread the data over multiple SQL Azure databases and has the advantage that each database would use a different CPU & hard drive (since each db is actually stored on a different machine in the data center) which should give you really good performance. Of course, this is under the assumption that your database has the possibility of being sharded. There is some more info on this sharding here.