Multitenancy: Single database or database per tenant? [closed] - mysql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
We are in the planning stages for a new multi-tenant SaaS app and have hit a deciding point. When designing a multi-tenant application, is it better to go for one monolithic database that holds all customer data (Using a 'customer_id' column) or is it better to have an independent database per customer? Regardless of the database decisions, all tenants will run off of the same codebase.
It seems to me that having separate databases makes backups / restorations MUCH easier, but at the cost of increased complexity in development and upgrades (Much easier to upgrade 1 database vs 500). It also is easier / possible to split individual customers off to separate dedicated servers if the situation warrants the move. At the same time, aggregating data becomes much more difficult when trying to get a broad overview of how customers are using the software.
We expect to have less than 250 customers for at least a year after launch, but they will be large customers and more will follow afterward.
As this is our first leap into SaaS, we are definitely looking to do this right from the start.

This is a bit long for a comment.
In most cases, you want one database with a separate customer id column in the appropriate tables. This makes it much easier to maintain the application. For instance, it much easier to replace a stored procedure in one database than in 250 databases.
In terms of scalability, there is probably no issue. If you really wanted to, you could partition your tables by client.
There are some reasons why you would want a separate database per client:
Access control: maintaining access control at the database level is much easier than at the row level.
Customization: customizing the software for a client is much easier if you can just work in a single environment.
Performance bottlenecks: if the data is really large and/or there are really large numbers of transactions on the system, it might be simpler (and cheaper) to distribute databases on different servers rather than maintain a humongous database.
However, I think the default should be one database because of maintainability and consistency.
By the way, as for backup and restore. If a client requires this functionality, you will probably want to write custom scripts anyway. Although you could use the database-level backup and restore, you might have some particular needs, such as maintaining consistency with data not stored in the database.

Related

Should I use many database in one application? [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
We are in the planning stages for a new multi-tenant SaaS app and have hit a deciding point. When designing a multi-tenant application, is it better to go for one monolithic database that holds all customer data (Using a 'customer_id' column) or is it better to have an independent database per customer? Regardless of the database decisions, all tenants will run off of the same codebase.
It seems to me that having separate databases makes backups / restorations MUCH easier, but at the cost of increased complexity in development and upgrades (Much easier to upgrade 1 database vs 500). It also is easier / possible to split individual customers off to separate dedicated servers if the situation warrants the move. At the same time, aggregating data becomes much more difficult when trying to get a broad overview of how customers are using the software.
We expect to have less than 250 customers for at least a year after launch, but they will be large customers and more will follow afterward.
As this is our first leap into SaaS, we are definitely looking to do this right from the start.
This is a bit long for a comment.
In most cases, you want one database with a separate customer id column in the appropriate tables. This makes it much easier to maintain the application. For instance, it much easier to replace a stored procedure in one database than in 250 databases.
In terms of scalability, there is probably no issue. If you really wanted to, you could partition your tables by client.
There are some reasons why you would want a separate database per client:
Access control: maintaining access control at the database level is much easier than at the row level.
Customization: customizing the software for a client is much easier if you can just work in a single environment.
Performance bottlenecks: if the data is really large and/or there are really large numbers of transactions on the system, it might be simpler (and cheaper) to distribute databases on different servers rather than maintain a humongous database.
However, I think the default should be one database because of maintainability and consistency.
By the way, as for backup and restore. If a client requires this functionality, you will probably want to write custom scripts anyway. Although you could use the database-level backup and restore, you might have some particular needs, such as maintaining consistency with data not stored in the database.

Database Design: When should I use multiple database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Background:
I'm trying to build the backend services of an app. This app has rooms where a user can join in to. When a user joins a room, s/he can exchange some data with other users through socket. Outside of the room, the user can view the processed data from the transactions that happened inside the rooms s/he joined in to. Room list, room informations, and the data transactions inside the room should be stored in a database.
My idea is to create a project with one database, however, an experienced developer suggested to the database into two:
One that uses MongoDB for storing data transactions happening inside the room.
One that uses MySQL for storing and returning room list, room information and analytics of what happened inside a room.
Problem I see with using multiple database:
I did some research and from what I understand, multiple database is not recommended but could be implemented if data are unrelated. Displaying analytics will need to process data transactions that happened inside a room and also display the room's information. If I use the two database approach, I will need to retrieve data from both database in order to achieve this.
Question:
I personally think it's easier to use a single database approach since I don't see the data inside and outside of the room as 'unrelated'. Am I missing an important point on when to use multiple database?
Thanks in advance. Have a good day.
You can look at this problem from two perspectives; technical and practical.
Technically, if your back-end is expected to become very complex or scaled, it is recommended to break it down into multiple microservices, each in charge of a tiny task. Ideally, these small services should help you achieve separation of concerns, so each service only works with one piece of the data. If other services need to read and/or modify that piece of data, they have to go through the service in charge.
In this case, depending on the data each service is dealing with, you can pick the proper database. For instance, if you have transactional data, you can use MySQL, MongoDB for large schemaless content, or Elasticsearch if you want to perform a text search.
Practically, it is recommended to start small with one service and database (if you prefer to develop your app sooner), and then, over time break it down into multiple services as you need to add and/or improve features.
There are multiple points to keep in mind. First, if you expect to have a large user base, you should start development with the right architecture from the beginning to avoid scaling issues. Second, sometimes one database cannot perform the task you need. For example, it would be very inefficient to do a text search in MySQL. Finally, there is no absolute right way of doing things. However you do one thing, another friend might show up tomorrow and ask you why you did not do it his/her way. The most important thing is to start doing and then, learning and improving along the way. Facebook was started with MySQL and it worked fine initially. Could it survive one database type today? I suspect the answer is no, but would it have made sense for them to add all the N databases that they have now back then?
Good luck developing!

MySQL performance issues in my mind [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
First of all, I'm not a very experienced developer, I'm making mid-size apps in PHP, MySQL and Javascript.
There is something though which is making it hard for me to design a MySQL InnoDB database before each project. And that is the performance. I'm always quite worried about if I'm creating a normalized database scheme that when I'll have to join a couple of tables (like 5-6) together (there are usually a few many-to-many, many-to-one relationships between them) it will affect the performance a LOT (in negative) when each of these 5-6 tables has around 100k rows.
These projects that I usually have is creating analytics platforms. Therefore I'm expecting around 100M of clicks in total and I usually have to join this table to many others (each around 100k of rows) to get some data displayed. I'm usually making summarized tables of the clicks but cannot do the same for the other tables.
I'm not quite sure if I have to worry about future performance in this stage. Currently, I am actively managing a few of these applications with 30M+ clicks and tables that I join to this Clicks table with 40k+ rows. The performance is pretty bad - a select operation usually takes more than 10-20s to complete while I believe I have proper indexing, innodb_buffer_pool_size also.
I've read a lot about the key to having an optimized database is the design. That's why I'm usually thinking about the DB scheme a LOT before creating it.
Do I really have to worry about creating DB schemes where I'll have to Join 5-6 many-to-many/many-to-one/one-to-many tables or it's quite usual and MySQL should be able to easily handle this load?
Is there anything else that I should consider before creating a DB scheme?
My usual server setup is having a MySQL Server with 4GB RAM + 2 vCPUs, to serve the DB and a WebServer with 4GB RAM + 2 vCPUs. Both of them are using Ubuntu's 16.04 release and using the latest MySQL (5.7.21) and PHP7-fpm.
Gordon is right. RDBMSs are made to handle your kind of workload.
If you're using virtual machines (cloud, etc) to host your stuff, you can generally increase your RAM, vCPU count, and IO capacity simply by spending more money. But, usually, throwing money at DBMS peformance problems is less helpful than throwing better indexes at them.
At the scale of 100M rows, query performance is a legitimate concern. You will, as your project develops, need to revisit your DBMS indexing to optimize the queries you're actually using. So plan on that. The thing is, you cannot and will not know until you get lots of data what your actual performance issues will be.
Read this for a preview of what's coming: https://use-the-index-luke.com/ .
One piece of advice: partitioning of tables generally doesn't solve performance problems except under very specific circumstances.
Look up this acronym: YAGNI.
And go do your project. Spend your present effort getting it working.

Planning for database scaling and schema changes [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm doing research before I create my social network database and I've found a lot of questions/resources pertaining to graph and key-value databases for social networks. I understand there are a TON of different options and ways to implement the DB. I also understand that what the big companies do is complex and way above what I currently need (1b+ users). I also know each of the big companies have revamped their databases to account for the insane scaling they go through.
Because I don't know how the network will grow, and I don't believe I can accurately create a model that will scale to 1m users (due to unknowns such as how people will use it, how often people post, comment, etc). But I can at least try to create a database that will be easiest to scale when (if) the need arises.
Do most companies create a database to handle up to 1k users, then once they grow, they revamp it for 10k users, then 100k, etc? If they do, at each of these arbitrary numbers (because of the unknowns listed above), do companies typically change a few tables/nodes/etc, or do they completely recreate the database to take advantage of new technologies (such as moving from SQL to graph)?
I want to pick the best solution, but I'm finding the decision between graph, key-value, SQL, among others very difficult--especially with no data to know what relationships/data is most important. I believe I can create a solid system using a graph that can support up to 10k users, but I'm worried having to potentially completely reacreate the database as the system grows. Is this a worry now to avoid issues, or implement now and adapt later type problem?
Going further, if I do need to plan on complete DB restructures, does it typically make sense to use a Multi-Model NoSQL DBMS (such as OrientDB or ArangoDB)?
I personally think you are asking premature questions.
Seriously, even with a bad model, a database can handle 10k users.
You think about scaling, but the hardest problem is not scaling, it is to come to the point where you need to scale.
I'm sure everybody wants 1bn users, but then you are already dreaming about having a social network with 200 times more users than Github itself ? (Github has ~ 5 million users).
Also, even by thinking it ahead, you will refactor and refactor again definitely during years, and you will have more than one persistence layer, be sure of it.
Code and code good, stay lean, remain able to change quickly, deploy, show to users, refactor, test, deploy and show to users in the same day. These are the things you need to do now, not asking questions about a problem you don't have yet, you definitely have a lot of other problems to solve now ;-)
UPDATE
Based on your comment, you might need to think that there are questions we just can not simply answer, because we don't need your exact requirements.
I have a simple app, which uses 4 persistence layers, and this app is not yet online. I'll give you my "why" about using it and which use case :
Neo4j : it is the core of the application data, I use it because I love it, I know it very much (it is my job) and, as the concept of the app is quite new and can evolve rapidly, having a schemaless db is reducing a lot of the refactoring stuff. Also I have now a lot of use cases coming by building the app, which make Neo4j a good choice when you need to add features without breaking what has already been done.
MySQL
I use it for User accounts and profiles. Why ? Because the framework I use already has a lot of bundles integrating this kind of stuff in a couple of lines of code, the bundles are well maintained and if I would use (currently) neo4j for it, I will have to reinvent the wheel. Also all the modules I use evolve in stability and compatibility with the framework.
Of course the mysql data is coupled (minimally) with the neo4j one. But I know that this kind of data will not evolve that much, so Mysql is a good choice and in case I have to refactor some points, this will not be a huge pain.
Redis
I use Redis for storing analytics data, Redis is quite flexible and I can easily create new keys and add data on top of it.
RabbitMQ :
I use a lot of message queues, why ? For testing refactoring. I can easily process messages with multiple consumers for testing "refactoring", testing mutliple database layers while the app is running for testing changes, testing new features, testing refactoring, ...
You will refactor ! Just try to keep it as simple as possible.

Why is it (probably) a bad idea to process database data with programming languages like C / Python? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
So I'm working on a task of calculating medians for every 100 records in a giant MySQL table, which appears to be a straightforward problem but ends up with very complex SQL code. One of my friend who saw my work asked me, why don't you load the data into memory and process it with C or Python, wouldn't that be easier? My intuition is that it is a bad idea. But can someone elaborate more about why it is not suggested? Thank you!
I can think of no good reason to tell you that it's a bad idea to use a front-end to process data stored in a MySQL db... to me it's something like "don't use knives to cut your food because you can cut your own finger".
You can, of course write some stored procedures or functions that might give you the results you need, but if you can't make it work with MySQL, then the obvious step is to use another tool.
You must, however, take some precautions:
Don't overload your network connection (trivial if you are working on localhost).
Don't try to store too big resultsets in memory: keep it simple and small (divide and conquer)
Let the database server do the heavy work, and use your front-end to do the fine work (if you need to filter data, let MySQL do that for you, and write the code to do the calculations on the filtered data).
Be sure to take the appropriate precautions when sending queries to MySQL (avoid SQL injection vulnerabilities)
In general, yes you should do your heavy lifting in the database. If your dataset is fairly small, it wouldn't matter whether you do the calculations on the database server or on the database client.
The primary consideration whether to do the calculation in the db server vs on the db client is usually of performance. If you do the heavy calculations on the db client, you may end up having to transfer a lot of data through the db connection. With large datasets, transferring the entire table to the client may become performance issues, and if your database server lives in a different machine than your application server (i.e. not localhost), then the network transfer overhead becomes even worse.
If you have to transfer the entire dataset anyway, then there likely won't be any significant performance difference. The SQL language itself isn't inherently faster than the client languages for doing number crunching, it simply has the advantage of running on the server process and thus can avoid the overhead of data transfer.
There are also applications that uses multiple data sources, for these, generally you'll will often up with no other choice but to do parts your calculations the client side.
Ultimately, you have to measure. It didn't matter whether it's best practice or not, if doing the calculation in the client is fast enough and it simplify the overall code doing that, then do take that route.