Database Design: When should I use multiple database? [closed] - mysql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Background:
I'm trying to build the backend services of an app. This app has rooms where a user can join in to. When a user joins a room, s/he can exchange some data with other users through socket. Outside of the room, the user can view the processed data from the transactions that happened inside the rooms s/he joined in to. Room list, room informations, and the data transactions inside the room should be stored in a database.
My idea is to create a project with one database, however, an experienced developer suggested to the database into two:
One that uses MongoDB for storing data transactions happening inside the room.
One that uses MySQL for storing and returning room list, room information and analytics of what happened inside a room.
Problem I see with using multiple database:
I did some research and from what I understand, multiple database is not recommended but could be implemented if data are unrelated. Displaying analytics will need to process data transactions that happened inside a room and also display the room's information. If I use the two database approach, I will need to retrieve data from both database in order to achieve this.
Question:
I personally think it's easier to use a single database approach since I don't see the data inside and outside of the room as 'unrelated'. Am I missing an important point on when to use multiple database?
Thanks in advance. Have a good day.

You can look at this problem from two perspectives; technical and practical.
Technically, if your back-end is expected to become very complex or scaled, it is recommended to break it down into multiple microservices, each in charge of a tiny task. Ideally, these small services should help you achieve separation of concerns, so each service only works with one piece of the data. If other services need to read and/or modify that piece of data, they have to go through the service in charge.
In this case, depending on the data each service is dealing with, you can pick the proper database. For instance, if you have transactional data, you can use MySQL, MongoDB for large schemaless content, or Elasticsearch if you want to perform a text search.
Practically, it is recommended to start small with one service and database (if you prefer to develop your app sooner), and then, over time break it down into multiple services as you need to add and/or improve features.
There are multiple points to keep in mind. First, if you expect to have a large user base, you should start development with the right architecture from the beginning to avoid scaling issues. Second, sometimes one database cannot perform the task you need. For example, it would be very inefficient to do a text search in MySQL. Finally, there is no absolute right way of doing things. However you do one thing, another friend might show up tomorrow and ask you why you did not do it his/her way. The most important thing is to start doing and then, learning and improving along the way. Facebook was started with MySQL and it worked fine initially. Could it survive one database type today? I suspect the answer is no, but would it have made sense for them to add all the N databases that they have now back then?
Good luck developing!

Related

Should I use many database in one application? [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
We are in the planning stages for a new multi-tenant SaaS app and have hit a deciding point. When designing a multi-tenant application, is it better to go for one monolithic database that holds all customer data (Using a 'customer_id' column) or is it better to have an independent database per customer? Regardless of the database decisions, all tenants will run off of the same codebase.
It seems to me that having separate databases makes backups / restorations MUCH easier, but at the cost of increased complexity in development and upgrades (Much easier to upgrade 1 database vs 500). It also is easier / possible to split individual customers off to separate dedicated servers if the situation warrants the move. At the same time, aggregating data becomes much more difficult when trying to get a broad overview of how customers are using the software.
We expect to have less than 250 customers for at least a year after launch, but they will be large customers and more will follow afterward.
As this is our first leap into SaaS, we are definitely looking to do this right from the start.
This is a bit long for a comment.
In most cases, you want one database with a separate customer id column in the appropriate tables. This makes it much easier to maintain the application. For instance, it much easier to replace a stored procedure in one database than in 250 databases.
In terms of scalability, there is probably no issue. If you really wanted to, you could partition your tables by client.
There are some reasons why you would want a separate database per client:
Access control: maintaining access control at the database level is much easier than at the row level.
Customization: customizing the software for a client is much easier if you can just work in a single environment.
Performance bottlenecks: if the data is really large and/or there are really large numbers of transactions on the system, it might be simpler (and cheaper) to distribute databases on different servers rather than maintain a humongous database.
However, I think the default should be one database because of maintainability and consistency.
By the way, as for backup and restore. If a client requires this functionality, you will probably want to write custom scripts anyway. Although you could use the database-level backup and restore, you might have some particular needs, such as maintaining consistency with data not stored in the database.

migration from mysql to nosql database in production without code change and mysql without foreign keys and indexes [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
i have two scenarios here :
migrating mysql database to nosql without code change(no orms are used)
using no foriegn keys and indexes in mysql(because they want to migrate to different database in future)
3.all this done by very less code change
these questions are asked by my team lead. so i dont have a answer to give him properly because i feel it very unlikely to do mysql with no indexes and foreign keys and first of all if they are not meant to use mysql.then why they choose that.
i want to know that people do like this in software industries
ofently or they will choose on their need fits correctly
they are saying that foreign key validitations are done by api level
not by mysql level
i dont understand them becasue i have less experience so i dont have an answer why they are saying like this. please give me some insight to this that if this is a good practice or not ?
I don't think it will be possible without adding code - you need to implement how your data is managed by your nosql dB engine in some way. If the project is coded with a clear separation of business logic and database code, it's a simple matter of using the new database implementation instead of the old one. If that is not the case and your db implementation leaked into your business logic, then it will not be possible to switch without changing code. Depending on the size of the code base it might /will most likely be too expensive.
If you want to see an example of a clean separation of dB logic from business logic, have a look at this repository: https://github.com/fathersson/money-transfer
(this is not my repository, I just stumbled upon it today)
If you want to learn and understand the principles driving that design, start by looking for "clean architecture" and/or "Domain Driven Design" - the first one is easier to understand in my opinion and there are some talks on YouTube by Robert C. Martin that you can have a look at before buying some books.
Edit: The project I'm working on at the moment did change from postgresql running on rds to dynamodb using a different repository without changing any existing business logic. It saves a lot of money that way. So yes, changing the db backend does happen and is driven by requirements.
In addition to that, when I start working on a new feature set/micro service/bounded context I usually start with a simple in memory repository implementation that's using a map. After I'm done with the initial set of use cases, I know more about the db requirements and choose the db engine based on these and the general requirement to limit the number of different technologies in use.

Multitenancy: Single database or database per tenant? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
We are in the planning stages for a new multi-tenant SaaS app and have hit a deciding point. When designing a multi-tenant application, is it better to go for one monolithic database that holds all customer data (Using a 'customer_id' column) or is it better to have an independent database per customer? Regardless of the database decisions, all tenants will run off of the same codebase.
It seems to me that having separate databases makes backups / restorations MUCH easier, but at the cost of increased complexity in development and upgrades (Much easier to upgrade 1 database vs 500). It also is easier / possible to split individual customers off to separate dedicated servers if the situation warrants the move. At the same time, aggregating data becomes much more difficult when trying to get a broad overview of how customers are using the software.
We expect to have less than 250 customers for at least a year after launch, but they will be large customers and more will follow afterward.
As this is our first leap into SaaS, we are definitely looking to do this right from the start.
This is a bit long for a comment.
In most cases, you want one database with a separate customer id column in the appropriate tables. This makes it much easier to maintain the application. For instance, it much easier to replace a stored procedure in one database than in 250 databases.
In terms of scalability, there is probably no issue. If you really wanted to, you could partition your tables by client.
There are some reasons why you would want a separate database per client:
Access control: maintaining access control at the database level is much easier than at the row level.
Customization: customizing the software for a client is much easier if you can just work in a single environment.
Performance bottlenecks: if the data is really large and/or there are really large numbers of transactions on the system, it might be simpler (and cheaper) to distribute databases on different servers rather than maintain a humongous database.
However, I think the default should be one database because of maintainability and consistency.
By the way, as for backup and restore. If a client requires this functionality, you will probably want to write custom scripts anyway. Although you could use the database-level backup and restore, you might have some particular needs, such as maintaining consistency with data not stored in the database.

Which database should I choose? MySQL or mongoDB? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm working on a project which is somewhat familiar to WhatApp, except that I'm not implementing the chatting function.
The only thing I need to store in the database is user's information, which won't be very large, and I also need an offline database in the mobile app which should be synced with the server database.
Currently I use MySQL for my server database, and I'm thinking of using JSON for the syncing between mobile app and the server, then I found that mongoDB has a natural support for JSON, which caused me wonder should I change to mongoDB.
So here are my questions:
Should I change to mongoDB or should I still use MySQL? The data for each user won't be too large and it does have some requirement for data consistency. But mongoDB's JSON support is somewhat attractive.
I'm not familiar with the syncing procedure, I did some digging and it appears that JSON is a good choice, but what data should I put into the JSON files?
I actually flagged this as too broad and as attracting primarily opinion based answers but I'll give it a go anyhow and hope to stay objective.
First of all you have 2 separate questions here.
What database system should I use.
How do I sync between app and server.
The first one is easily answered because it doesn't really matter. Both are good options for storing data. MySQL is mature and stable and MongoDB although it's newer has very good reviews and I don't know of any known problems which would prevent it from being used. So take the database which you find easy to use.
Now for second I'll first put in a disclaimer that for data synchronization between multiple entities entire books are written and that it is after all this time still the subject of Phds.
I would advice against directly synchronizing between mobile app and database because that requires the database credentials to be contained within the app. Mobile apps can and will be decompiled and credentials extracted which would compromise your entire database. So you'll probably want to create some API which first does device/user authentication and then changes the database.
This already means that using MongoDB for sake of this would probably be a bad idea.
Now JSON itself is just a format of representing data with some structure, just as XML. As such it's not a method of synchronization but transport.
For synchronizing data it's important that you know the source of truth.
If you have 1 device <-> 1 record it's easy because the device will be the source of truth, after all the only mutations that take place are presumably done by the user on the device.
If you have n devices <-> 1 record then it becomes a whole lot more annoying. If you want to allow a device to change the state when offline you'll need to do some tricks to synchronize the data when the device comes back online. But this is probably a question too complex and situation dependent to answer on SO.
If you however force the device to always immediately propagate changes to the database then the database will always contain the most up to date record, or truth. Downside is that part of the app will not be functional when offline.
If offline updates don't change the state but merely add new records then you can push those to the server when it comes online. But keep in mind you won't be able to order these events.

I would like to create a database with the goal of populating this database with comprehensive inventory information obtained via a shell script [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I would like to create a database with the goal of populating this database with comprehensive inventory information obtained via a shell script from each client machine. My shell script currently writes this information to a single csv file located on a server through an ssh connection. Of course, if this script were to be run on multiple machines at once it would likely cause issues as each client potentially would try to write to the csv at the same time.
In the beginning, the inventory was all I was after; however after more thought I began to ponder wether or not much much more could be possible after I gathered this information. If I were to have this information contained within a database I might be able to utilize the information to initialize other processes based on the information of a specific machine or group of "like" machines. It is important to note that I am already currently managing a multitude of processes by identifying specific machine information. However pulling that information from a database after matching a unique identifier (in my mind) could greatly improve the efficiency. Also allowing for more of server side approach cutting down on the majority of client side scripting. (Instead of gathering this information from the client machine on the startup of each client I would have it already in a central database allowing a server to utilize the information and kick off specific events)
I am completely foreign to SQL and am not certain if it is 100% necessary. Is it necessary? For now I have decided to download and install both PostgreSQL and MySQL on separate Macs for testing. I am also fairly new to stackoverflow and apologize upfront if this is an inappropriate question or style of question. Any help including a redirection would be appreciated greatly.
I do not expect a step by step answer by any means, rather am just hoping for a generic "proceed..." "this indeed can be done..." or "don't bother there is a much easier solution."
As I come from the PostgreSQL world, I highly recommend using it for it's strong enterprise-level features and high standard compliance.
I always prefer to have a database for each project that I'm doing for the following benefits:
Normalized data is easier to process and build reports on;
Performance of database queries will be much better due to the caching done by the DB engine, indexes on your data, optimized query paths;
You can greatly improve machine data processing by using SQL/MED, which allows querying external data sources from the database directly. You can have a look on the Multicorn project and examples they provide.
Should it be required to deliver any kinds of reports to your management, DB will be your friend, while doing this outside the DB will be overly complicated.
Shortly — go for the database!