Microservices centralized database model - mysql

Currently we have some microservice, they have their own database model and migration what provided by GORM Golang package. We have a big old MySQL database which is against the microservices laws, but we can't replace it. Im afraid when the microservices numbers start to growing, we will be lost in the many database model. When I add a new column in a microservice I just type service migrate to the terminal (because there is a cli for run and migrate commands), and it is refresh the database.
What is the best practice to manage it. For example I have 1000 microservice, noone will type the service migrate when someone refresh the models. I thinking about a centralized database service, where we just add a new column and it will store all the models with all migration. The only problem, how will the services get to know about database model changes. This is how we store for example a user in a service:
type User struct {
ID uint `gorm:"column:id;not null" sql:"AUTO_INCREMENT"`
Name string `gorm:"column:name;not null" sql:"type:varchar(100)"`
Username sql.NullString `gorm:"column:username;not null" sql:"type:varchar(255)"`
}
func (u *User) TableName() string {
return "users"
}

Depending on your use cases, MySQL Cluster might be an option. Two phase commits used by MySQL Cluster make frequent writes impractical, but if write performance isn't a big issue then I would expect MySQL Cluster would work out better than connection pooling or queuing hacks. Certainly worth considering.

If I'm understanding your question correctly, you're trying to still use one MySQL instance but with many microservices.
There are a couple of ways to make an SQL system work:
You could create a microservice-type that handles data inserts/reads from the database and take advantage of connection pooling. And have the rest of your services do all their data read/writes through these services. This will definitely add a bit of extra latency to all your writes/reads and likely be problematic at scale.
You could attempt to look for a multi-master SQL solution (e.g. CitusDB) that scales easily and you can use a central schema for your database and just make sure to handle edge cases for data insertion (de-deuping etc.)
You can use data-streaming architectures like Kafka or AWS Kinesis to transfer your data to your microservices and make sure they only deal with data through these streams. This way, you can de-couple your database from your data.
The best way to approach it in my opinion is #3. This way, you won't have to think about your storage at the computation layer of your microservice architecture.
Not sure what service you're using for your microservices, but StdLib forces a few conversions (e.g. around only transferring data through HTTP) that helps folks wrap their head around it all. AWS Lambda also works very well with Kinesis as a source to launch the function which could help with the #3 approach.
Disclaimer: I'm the founder of StdLib.

If I understand your question correctly it seems to me that there may be multiple ways to achieve this.
One solution is to have a schema version somewhere in the database that your microservices periodically check. When your database schema changes you can increase the schema version. As a result of this if a service notices that the database schema version is higher than the current schema version of the service it can migrate the schema in the code which gorm allows.
Other options could depend on how you run your microservices. For example if you run them using some orchestration platform (e.g. Kubernetes) you could put the migration code somewhere to run when your service initializes. Then once you update the schema you can force a rolling refresh of your containers which would in turn trigger the migration.

Related

Mirroring homogeneous data from one MySQL RDS to another MySQL RDS

I have two MySQL RDS's (hosted on AWS). One of these RDS instances is my "production" RDS, and the other is my "performance" RDS. These RDS's have the same schema and tables.
Once a year, we take a snapshot of the production RDS, and load it into the performance RDS, so that our performance environment will have similar data to production. This process takes a while - there's data specific to the performance environment that must be re-added each time we do this mirror.
I'm trying to find a way to automate this process, and to achieve the following:
Do a one time mirror in which all data is copied over from our production database to our performance database.
Continuously (preferably weekly) mirror all new data (but not old data) between our production and performance MySQL RDS's.
During the continuous mirroring, I'd like for the production data not to overwrite anything already in the performance database. I'd only want new data to be inserted into the production database.
During the continuous mirroring, I'd like to change some of the data as it goes onto the performance RDS (for instance, I'd like to obfuscate user emails).
The following are the tools I've been researching to assist me with this process:
AWS Database Migration Service seems to be capable of handling a task like this, but the documentation recommends using different tools for homogeneous data migration.
Amazon Kinesis Data Streams also seems able to handle my use case - I could write a "fetcher" program that gets all new data from the prod MySQL binlog, sends it to Kinesis Data Streams, then write a Lambda that transforms the data (and decides on what data to send/add/obfuscate) and sends it to my destination (being the performance RDS, or if I can't directly do that, then a consumer HTTP endpoint I write that updates the performance RDS).
I'm not sure which of these tools to use - DMS seems to be built for migrating heterogeneous data and not homogeneous data, so I'm not sure if I should use it. Similarly, it seems like I could create something that works with Kinesis Data Streams, but the fact that I'll have to make a custom program that fetches data from MySQL's binlog and another program that consumes from Kinesis makes me feel like Kinesis isn't the best tool for this either.
Which of these tools is best capable of handling my use case? Or is there another tool that I should be using for this instead?

How to connect to an arbitary database using FaaS?

I just did some reading about serverless computing and FaaS. If using FaaS to access an arbitrary database, we need each time to establish and close a database connection. In, lets say a node applications, we would usually establish the connection once and reuse it for multiple requests.
Correct?
I have a hosted MongoDB at mlab and thought about implementing a REST API with Googles Cloud Functions Service. Don't know how to handle the database connection efficient.
For sure thing get clearer while coding and testing. But I would like to know chances to succeed before spending a lot of time.
Thanks
Stefan
Serverless platforms reuse the underlying containers between distinct function invocations whenever possible. Hence you can set up a database connection pool in the global function scope and reuse it for subsequent invocations - as long as the container stays warm. GCP has a guide here using MySQL but I imagine the same applies to MongoDB.

Multiples readonly databases(backup connections) on sails.js

i have 2 databases with readonly status on amazon AWS RDS, as readonly databases cant be multi-az, i need to control them manually, my question is, its possible to sails control each connection to use by default my readonly1 database and when this one fail on connect, start using the readonly2 database?
Thanks.
What #Sangharsh said but also, Sails is great when your use case is fairly simple, when it is not then it's better to make use of elaborate methods of interfacing with the database (mysql#npm) and structure this into adapters/services/helpers.
On a more simple scope if you only wanted to read records and have it fall back to the alternative database for data if the connection fails then you could have like MySQLService with a read() method that does what you said if this is the only thing that you need to do.
That being said, you should do some research into load balancing the servers in AWS as you can do this in there without Sails related code :-)

SQLite3 database per customer

Scenario:
Building a commercial app consisting in an RESTful backend with symfony2 and a frontend in AngularJS
This app will never be used by many customers (if I get to sell 100 that would be fantastic. Hopefully much more, but in any case will be massive)
I want to have a multi tenant structure for the database with one schema per customer (they store sensitive information for their customers)
I'm aware of problem when updating schemas but I will have to live with it.
Today I have a MySQL demo database that I will clone each time a new customer purchase the app.
There is no relationship between my customers, so I don't need to communicate with multiple shards for any query
For one customer, they can be using the app from several devices at the time, but there won't be massive write operations in the db
My question
Trying to set some functional tests for the backend API I read about having a dedicated sqlite database for loading testing data, which seems to be good idea.
However I wonder if it's also a good idea to switch from MySQL to SQLite3 database as my main database support for the application, and if it's a common practice to have one dedicated SQLite3 database PER CLIENT. I've never used SQLite and I have no idea if the process of updating a schema and replicate the changes in all the databases is done in the same way as for other RDBMS
Is this a correct scenario for SQLite?
Any suggestion (aka tutorial) in how to achieve this?
[I wonder] if it's a common practice to have one dedicated SQLite3 database PER CLIENT
Only if the database is deployed along with the application, like on a phone. Otherwise I've never heard of such a thing.
I've never used SQLite and I have no idea if the process of updating a schema and replicate the changes in all the databases is done in the same way as for other RDBMS
SQLite is a SQL database and responds to ALTER TABLE and the like. As for updating all the schemas, you'll have to re-run the update for all schemas.
Schema synching is usually handled by an outside utility, usually your ORM will have something. Some are server agnostic, some only support specific servers. There are also dedicated database change management tools such as Sqitch.
However I wonder if it's also a good idea to switch from MySQL to SQLite3 database as my main database support for the application, and
SQLite's main advantage is not requiring you to install and run a server. That makes sense for quick projects or where you have to deploy the database, like a phone app. For server based application there's no problem having a database server. SQLite's very restricted set of SQL features becomes a disadvantage. It will also likely run slower than a server database for anything but the simplest queries.
Trying to set some functional tests for the backend API I read about having a dedicated sqlite database for loading testing data, which seems to be good idea.
Under no circumstances should you test with a different database than the production database. Databases do not all implement SQL the same, MySQL is particularly bad about this, and your tests will not reflect reality. Running a MySQL instance for testing is not much work.
This separate schema thing claims three advantages...
Extensibility (you can add fields whenever you like)
Security (a query cannot accidentally show data for the wrong tenant)
Parallel Scaling (you can potentially split each schema onto a different server)
What they're proposing is equivalent to having a separate, customized copy of the code for every tenant. You wouldn't do that, it's obviously a maintenance nightmare. Code at least has the advantage of version control systems with branching and merging. I know only of one database management tool that supports branching, Sqitch.
Let's imagine you've made a custom change to tenant 5's schema. Now you have a general schema change you'd like to apply to all of them. What if the change to 5 conflicts with this? What if the change to 5 requires special data migration different from everybody else? Now let's imagine you've made custom changes to ten schemas. A hundred. A thousand? Nightmare.
Different schemas will require different queries. The application will have to know which schema each tenant is using, there will have to be some sort of schema version map you'll need to maintain. And every different possible query for every different possible schema will have to be maintained in the application code. Nightmare.
Yes, putting each tenant in a separate schema is more secure, but that only protects against writing bad queries or including a query builder (which is a bad idea anyway). There are better ways mitigate the problem such as the view filter suggested in the docs. There are many other ways an attacker can access tenant data that this doesn't address: gain a database connection, gain access to the filesystem, sniff network traffic. I don't see the small security gain being worth the maintenance nightmare.
As for scaling, the article is ten years out of date. There are far, far better ways to achieve parallel scaling then to coarsely put schemas on different servers. There are entire databases dedicated to this idea. Fortunately, you don't need any of this! Scaling won't be a problem for you until you have tens of thousands to millions of tenants. The idea of front loading your design with a schema maintenance nightmare for a hypothetical big parallel scaling problem is putting the cart so far before the horse, it's already at the pub having a pint.
If you want to use a relational database I would recommend PostgreSQL. It has a very rich SQL implementation, its fast and scales well, and it has something that renders this whole idea of separate schemas moot: a built in JSON type. This can be used to implement the "extensibility" mentioned in the article. Each table can have a meta column using the JSON type that you can throw any extra data into you like. The application does not need special queries, the meta column is always there. PostgreSQL's JSON operators make working with the meta data very easy and efficient.
You could also look into a NoSQL database. There are plenty to choose from and many support custom schemas and parallel scaling. However, it's likely you will have to change your choice of framework to use one that supports NoSQL.

mySQL authoritative database - combined with Firebase

We have built a LAMP-stack API application via PHP Laravel. This currently uses a local mySQL instance. We have mostly implemented views in AngularJS.
In order to use Firebase, we need to sync data between the authoritative store in mySQL with anything relevant that exists on Firebase, as close to real-time as possible. This means that other parts of the app which are not real-time and don't use Firebase can also serve up fresh content that's very recently been entered into the system.
I know that Firebase is essentially a noSQL database in the cloud. My question is - how do I write a wrapper or a means to sync the canonical version of my Firebase into my database of record - mySQL?
Update to answer - our final decision - ditching Firebase as an option
We have decided against this, as we can easily have a socket.io instance on the same server with an extremely low latency connection to mySQL, so that the two can remain in sync. There's no need to go across the web when resources and endpoints can exist on localhost. It also gives us the option to run our app without any internet connection, which is important if we sell an on-premise appliance to large companies.
A noSQL sync platform like Firebase is really just a temporary store that makes reads/writes faster in semi-real-time. If they attempt to get into the "we also persist everything for you" business - that's a whole different ask with much more commitment required.
The guarantee on eventual consistency between mySQL and Firebase is more important to get right first - to prevent problems down the line. Also, an RDMS is essential to our app - it's the only way to attack a lot of data-heavy problems in our analytics/data mappings - there's very strong reasons most of the world still uses a RDMS like mySQL, etc. You can make those very reliable too - through Amazon RDS and Google Cloud SQL.
There's no specific problem beyond scaling real-time sync that Firebase actually solves for us, which other open source frameworks don't already solve. If their JS lib actually handled offline scenarios (when you START offline) elegantly, I might have considered it, but it doesn't do that yet.
So, YMMV - but in our specific case, we're not considering Firebase for the reasons given above.
The entire topic is incredibly broad, definitely too broad to provide a simple answer to.
I'll stick to the use-case you provided in the comments:
Imagine that you have a checklist stored in mySQL, comprised of some attributes and a set of steps. The steps are stored in another table. When someone updates this checklist on Firebase - how would I sync mySQL as well?
If you insist on combining Firebase and mySQL for this use-case, I would:
Set up your Firebase as a work queue: var ref = new Firebase('https://my.firebaseio.com/workqueue')
have the client push a work item into Firebase: ref.push({ task: 'id-of-state', newState: 'newstate'})
set up a (nodejs) server that:
monitors the work queue (ref.on('child_added')
updates the item in the mySQL database
removes the task from the queue
See this github project for an example of a work queue on top of Firebase: https://github.com/firebase/firebase-work-queue