Single Cloud SQL or multiple databases? - mysql

Not sure how to ask this question, but as I understand google cloud SQL supports the idea of instances, which are located throughout their global infrastructure...so I can have a single database spread across multiple instances all over the world.
I have a a few geographic regions our app serves...the data doesn't really need to be aggregated as a whole and could be stored individually on separate databases in regions accordingly.
Does it make sense to serve all regions off one database/multiple instances? Or should I segregate each region into it's own database and host the data the old fashion way?

If by “scaling” you mean memory size, then you can start with a smaller instance (less RAM) and move up to a more powerful instance (more RAM) later.
But if you mean more operations per second, there is a certain max size and max number of operations that one Cloud SQL instance can support. You cannot infinitely scale one instance. Internally the data for one instance is indeed stored over multiples machines, but that is more related to reliability and durability, and it does not scale the throughput beyond a certain limit.
If you really need more throughput than what one Cloud SQL instance can provide, and you do need a SQL based storage, you’ll have to use multiple instances (i.e. completely separate databases) but your app will have to manage them.
Note that the advantages of Cloud go beyond just scalability. Cloud SQL instances are managed for you (e.g. failover, backups, etc. are taken care of). And you get billing based on usage.
(Cloud SQL team)

First, regarding the overall architecture: An "instance" in Google Cloud SQL is essentially one MySQL database server. There is no concept of "one database/multiple instances". Think of your Cloud SQL "instance" as the database itself. At any point in time, the data from a Cloud SQL instance is served out from one location -- namely where your instance happens to be running at that time. Now, if your app is running in Google App Engine or Google Compute Engine, then you can configure your Cloud SQL instance so that it is located close to your app.
Regarding your question of one database vs. multiple databases: If your database is logically one database and is served by one logical app, then you should probably have one Cloud SQL instance. (Again, think of one Cloud SQL instance as one database). If you create multiple Cloud SQL instances, they will be siloed from one another, and your app will have to do all the complex logic of managing them as completely different databases.
(Google Cloud SQL team)

Related

How efficiently use MySQL for Stock/TimeSeries related data?

I use Python and MySQL to ingest data via API and generate signals and order execution. Currently, things are functional yet coupled, that is, the single script is fetching data, storing it in MySQL, generating signals, and then executing orders. By tightly coupled does not mean all logic is in the same file, there are separate functions for different tasks. If somehow the script breaks everything will be halted. The way DB tables are generated is based on the instrument available on the fly after running a filter mechanism. The python code creates a different table of the same schema but with different table names based on the instrument name.
Now I am willing to separate the parts:
Data Ingestion (A Must)
Signal Generation
Order Execution
Reporting
First three I am mainly focusing. My concern is that if separate processes are running, acting on the same tables, will it generate any lock or something? How do I take care of it smoothly? or, is MySQL good enough for this or I move on to some other DB Like Postgres or others?
We are already using Digital Ocean Instance, MySQL is currently installed on the same instance.
If you intend to ingest/query time-series at scale, a conventional RDBMS will fall short at one point or another. They are designed for a use case in which reads are more frequent than writes, and optimise for that.
There is a whole family of databases designed specifically for working with Time-Series data. These time-series databases can ingest data at high throughput while running queries on top, and they usually give you lifecycle capabilities so you can decide what to do when data keeps growing.
There are many options available, both open source and proprietary. Out of those databases I would recommend you to try QuestDB because of a few reasons:
It is open source and Apache 2.0 licensed, so you can use it anywhere for anything
It is a single binary (or docker container) to operate
You query data using SQL, (with extensions for time series)
You can insert data using SQL, but you will experience locks if using concurrent clients. However you can also ingest data using the ILP protocol which is designed for ingestion speed. There are official clients in 7 languages so you don't have to deal with the low-level details
It is blazingly fast. I have seen over 2 million inserts per second on a single instance and some users report sustained workloads of over 100,000 events per second
It is well supported on Digital Ocean
There are a lot of public references (and many users who are not a reference) in the finance/trading/crypto industries

How to reduce Google Cloud SQL instance size?

I have a Google Cloud SQL MySQL 2nd Gen 5.7 instance with 70GB of unused storage. I want to reduce the unused storage of my instance as this is one of the major hosting costs.
The only way that I'm seeing as a possibility is dumping all the database to a new Google Cloud SQL instance created with the right storage capacity. However this approach has several pitfalls:
It would require a lot of time to dump and load the database (this would require severe downtime in the websites that are using the db)
I would have to change all the websites that use that database because the database's IP would change
What alternatives do I have?
As it is indicated in the official documentation, it is not possible to reduce the storage capacity of a Cloud SQL instance. You can give more capacity to an instance if needed, but the change is permanent and cannot be rolled back.
The best option for you right now would be to create a SQL dump file and export your current data into it. Then, in a new Cloud SQL instance with the desired capacity, import the SQL dump file so that all your data is stored in the new instance. To reduce time and costs for this process, please follow these best practices, including the use of the appropriate flags, compressing your data and other tips available for faster and less expensive imports and exports.
However, the possibility of reducing the capacity of a Cloud SQL instance has been requested as a new feature and is currently being worked on. Head to the link and click on the star icon on the top left corner to get email notifications on any progress made on this request.
There are no alternatives to reduce Instance size(Storage capacity) except one way which you mentioned in your question.
So, the way is first, export data from your old db instance:
Then, import the data into your new db instance which you newly created with reduced Instance size:
And as you know, it's impossible to reduce Instance size for all PostgreSQL, MySQL and SQL Server on Cloud SQL and only increasing Instance size is possible after creating your db instance.

AWS MySQL RDS architecture in multi regions

I am starting to plan a multi region (us-east & us-west) web app that involves AWS RDS MySQL db. i am going to put this in AWS. Can any aws guru clarify my concern?
I will have the multi AZ for redundancy/High Availibity. And the Read DB accross regions for faster READ request processing.
My concern/question:
If the master DB instance is in US-west. and if the write request from instances/computes/app server in us-east are routed to db endpoint which is in us-west, does this cause lag in the app OR is it the way how many AWS users uses?
The read instance local to the app servers are not for writes.
You can't defeat the speed of light.
Having a server write to the database that may be 80ms away may not result in acceptable performance. Only you can determine this.
You run into the same issue if you use MySQL replication across regions.
Now, if you just want to have read replicas across regions, with all writes directed to a single region, you can probably make that work.
If you really need a fast, globally distributed database, consider using something like DynamoDB.

Amazon AWS RDS MySQL - Instances and Pricing

I have 15 databases that are running off my shared hosting account for $5/mnth. I would like to use AWS RDS.
1) Do I have to create a new INSTANCE for each database?
2) Won't this be expensive $191/mnth ($12.75/mnth x 15DBs for the cheapest db.t2.micro instance)? http://aws.amazon.com/rds/pricing/
3) How can I optimize the management of resources? (eg. Is putting more tables into a single DB instance the best solution?)
No. One instance can host multiple databases. If they are used for different purposes, you will want to set up separate grants for each for security.
As i said in 1, you don't need to use multiple instances. Although depending on your workload, a micro may or may not be large enough.
That's a complicated question. Security requirements often come into play.

Database access related info on clustered environment

My understanding of database cluster is less because I have not worked on them. I have the below question.
Database cluster has two instance db server 1 & server 2. Each instance will have a copy of databases, considering the database has say Table A.
Normally a query request will be done by only one of the servers which is randomly decided.
Question1: I would like to know given the access can we explicitly tell which server should process the query?
Question2: Given the access, can a particular server say db server 2 be accessed from outside directly to query?
Either in Oracle or MySQL database.
/SR
There are many different ways to implement a cluster. Both MySQL and Oracle provide solutions out of the box - but very different ones. And there's always the option of implementing different clustering on top of the DBMS itself.
It's not possible to answer your question unless you can be specific about what cluster architecture and DBMS you are talking about.
C.
In Oracle RAC (Real Application Clusters), the data-storage (ie the disks on which the data is stored) are shared, so it's not really true to say there is more than one copy of the data... there is only one copy of the data. The two servers just access the storage separately (albeit with some co-operation)
From an Oracle perspective:
cagcowboy is correct; in an Oracle RAC system, there is but one database (the set of files on disk), multiple database instances (executing programs) on different logical or physical servers access those same files.
In Oracle, a query being executed in parallel can perform work using the resources of any member of the cluster.
One can "logically" partition the cluster so that a particular application prefers to connect to Member 1 of the cluster instead of Member 2 through the use of service names. However, if you force an application to always connect to a particular member of the cluster, you have eliminated a primary justification to cluster - high availability. Similarly, if the application connects to a functionally random member of the cluster, different database sessions with read and/or write interest in the same Oracle rows can significantly degrade performance.