Should I split the data between multiple databases or keep them in a single one?

Should I split the data between multiple databases or keep them in a single one? - mysql

I'm creating a multi-user/company web application in PHP & MySQL. I'm interested to know what the best practice is with regards to structuring my database(s).
There will be hundreds of companies and thousands of users of this web app so this needs to be robust. Each company won't be able to see other companies data, just their own. We will be storing mainly text data and will probably only be a few MB per company.
Currently the database contains 14 tables (for one sample company).
Is it better to put the data for all companies and their users in a single database and create a unique companyID for each one?
or:
Is it better to put each company's data in its own database and create a new database and table set for each new company that I add?
What are the pluses and minuses to each approach?
Thanks,
Stephen

If a single web app is being used by all the different companies, unless you have a very specific need or reason to use separate databases (it doesn't sound like you do), then you should definitely use a single database.
Your application will be responsible for only showing the correct information to the correct authenticated users.

Multiple databases would be a nightmare to maintain. For each new company you'd have to create and administer each one. If you make a change to one schema, you'll have to do it to your 14+.
Thousands of users and thousands of apps shouldn't pose a problem at all as long as you're using something that is a real database and not Access or something silly like that.

Multi-tenant
Pluses
Relatively easy to develop: only change database code in one place.
Lets you easily create queries which use data for multiple tenants.
Straightforward to add new tenants: no code needs to change.
Transforming a multi-tenant to a single-tenant setup is easy, should you need to change your design.
Minuses
Risk of data leak between tenants if coding is sloppy. Tenant view filters can in some cases be employed to reduce this risk. This method is based on using different database user accounts for different tenants.
If you break the code, all tenants will be affected.
Single-tenant
Pluses
If you have very different requirements for different tenants, several different database models can be beneficial. This is the best case for using a single tenant setup.
If you code sloppily, there's practically no risk of data leak between tenants (tenant A will not be able to access tenant B's data). In addition, if you accidentally destroy the schema of one tenant through a botched update, other tenants will remain unaffected.
Less SQL code when you don't need to take tenant ID values into account in your queries
Minuses
Database schemas tend to differentiate with time, often resulting in a nightmare. Using a database compare tool, you can alleviate this problem, but potentially many schemas need to be compared.
Including data from several databases in one query is typically complex, and often requires prepared statements.
Developing is hard, since you need to make the same changes to multiple schemas.
The same database entity can appear in many databases with different ID keys, resulting in confusion.
Transforming a single-tenant to a multi-tenant setup is very hard, should you need to change your design.

A single database is the relational way. One aspect from this perspective is that databases gather statistics about database usage and make heavy use of this. If you split things up you will be shooting yourself in the foot as the statistics will be fragmented.

Related

Postgres shared schema multi-tenant setup for e-commerce SaaS

I've researched a lot for the best multi-tenant setup of an e-commerce project but could not find a fitting answer. I am leaning to use a shared database separate schema setup with either MySQL or PostgreSQL. The structure of the tables is the same for all the tenants. I really like that with that setup the application code doesn't need to provide an extra WHERE clause for every query, so it is very developer friendly!
Now you also have the shared schema approach and that is what I am currently using, but I feel this is bad for security and isolation purposes. I would like to move to a different solution.
The app will be used by +- 100 webshops (a tenant) this year and I expect it to grow to into the thousands. Webshops ranges from small to large, so it is important that I can later pick out a specific shops data and put it in it's own database server.
Since I don't have any experience with a separate schema setup I would like to know if this would benefit me. What issues might I have when walking that path? Especially with changes in the structure of the tables, this is what bothers me the most. What is the limit of schema's to use in a separate schema approach using PostgreSQL (I will have 100-1000 schemas) before it would be a pain to manage?

The following is my take
Blockquote
When using the SharedDatabase with same table for all tenants, isolation happening with tenantid column is easy because you always add a filter that says tenantid = LoggedInTenantId [LoggedInTenantId => set during login]. When you have a base method in your ORM like EF [I'm from .Net], this would auto append to any query that goes out of the code.
When you opt for shared Schema, if there is a use-case like sharing data between tenants [webshops], it is not feasible. Else, if you have an accountant that wants to audit a collection of tenant's and wants a dashboard to view the a/c statistics etc, it becomes impossible
With the scaling point of view, you can better go for a separate db per tenant if a single tenant or a collection of them wants to scale out. This will be better than managing schemas.
Consider the use-cases that you may have for your product and share here so that we can take this discussion forward.
HTH

Best database model for saas application (1 db per account VS 1 db for everyone)

Little question, I'm developing a saas software (erp).
I designed it with 1 database per account for these reasons :
I make a lot of personalisation, and need to add specific table columns for each account.
Easier to manage db backup (and reload data !)
Less risky : sometimes I need to run SQL queries on a table, in case of an error with bad query (update / delete...), only one customer is affected instead of all of them.
Bas point : I'm turning to have hundreds of databases...
I'm hiring a company to manage my servers, and they said that it's better to have only one database, with a few tables, and put all data in the same tables with column as id_account. I'm very very surprised by these words, so I'm wondering... what are your ideas ?
Thanks !
Frederic

The current environment I am working in, we handle millions of records from numerous clients. Our solution is to use Schema to segregate each individual client. A schema allows you to partition your clients into separate virtual databases while inside a single db. Each schema will have an exact copy of the tables from your application.
The upside:
Segregated client data
data from a single client can be easily backed up, exported or deleted
Programming is still the same, but you have to select the schema before db calls
Moving clients to another db or standalone server is a lot easier
adding specific tables per client is easier (see below)
single instance of the database running
tuning the db affects all tenants
The downside:
Unless you manage your shared schema properly, you may duplicate data
Migrations are repeated for every schema
You have to remember to select the schema before db calls
hard pressed to add many negatives... I guess I may be biased.
Adding Specific Tables: Why would you add client specific tables if this is SAAS and not custom software? Better to use a Postgres DB with a Hstore field and store as much searchable data as you like.
Schemas are ideal for multi-tenant databases Link Link
A lot of what I am telling you depends on your software stack, the capabilities of your developers and the backend db you selected (all of which you neglected to mention)
Your hardware guys should not decide your software architecture. If they do, you are likely shooting yourself in the leg before you even get out of the gate. Get a good senior software architect, the grief they will save you, will likely save your business.
I hope this helps...
Bonne Chance

Dedicated database for each user vs single database for every user

I'll be soon developing a big cms where users can configure their website managing news, products, services and much more about their company.
Think about a shopify without the ecommerce part (at least for now).
The rdbms is MySQL and the user base will be about 150 (maybe bigger).
I'm trying to figure out which one of these two approaches would fit better.
DEDICATED DATABASE FOR EACH USER
PROS:
performance (and possible future sharding?): is querying smaller database with just your data better than querying a giant database with every user data?
easy "export my data" for users: I can simply dump their own db without fetching everything and putting it in some big encoded logical datastruct
SINGLE DATABASE FOR EVERY USER
PROS:
less general overhead
statistic: just one db to query to get and aggregate whatever I need
backup: one dump (not sure about this one because I've no experience in cluster dumping)
Which way would you go for? I don't think shopify created a dedicated database for any user registered... or maybe they did?
I'd like more experienced people than me to help me figuring out the best way and all the variables I can not guess right now because of my ignorance.

It sounds like you're developing a software-as-a-service hosted system, rather than a software package to distribute to customers for them to run on their own servers. In that case, in general, you will have an easier time developing and administering your service if you design it for a single database handling multiple users.
You'll be able to add new users to your system with data manipulation language (DML) rather than data definition language (DDL). That is, you'll insert rows for new users rather than create tables. That will make your life a LOT easier when you go live.
You are absolutely right that stuff like backups and aggregate reporting will be far easier if you have a single shared database.
Don't worry too much about the user data export functions. You'll have to develop software for those functions anyway; it won't be that hard to filter by user when you do the export.
But there's a downside you should consider to the single-database approach: if part of your requirement is to conceal various users' existence or data from each other, you'll have to be very careful to do this in your development. Will your users be competitors with each other? That could be tricky. You'll need to trust your in-house admin and support teams to refrain from disclosing one user's data to another by mistake (or deliberately). With a separate database per user, you'll have a smaller risk in that area.
150 users aren't many. Don't worry about scalability until you have a workload of paying customers. When that happens you can add MySQL server RAM, partitions, solid-state disks, replication, memcached, sharding, and all that other expensive and high-workload stuff. If you add those things before you go live, you'll just take longer and blow more money before you go live. Not good.

Application design single vs multiple databases

I was reading the answers for some of the similar questions but it doesn't quite help me understand if I made the right decision.
I have a web application for managing sales. I decided that every company should have their own database.
They need to backup their own data (this is easy to do if they have their own database)
It is pretty easy to scale if I grow. As I can move databases to different servers.
What are some pros + cons of having multiple databases vs a single?

If the data is self contained within in each database then I think your approach is sound.
The only con I can think of is that if you roll out a schema change to one/more of the tables then you're going to have to apply it to each and every customer's database. This could be an advantage if you want to update some and not others.
The pro that you mentioned about it being easy to scale by splitting customers between MySQL instances is a big pro. I designed a similar application a while ago which uses exactly the same approach.

Are the companies completely isolated from each other? What I mean, is it possible that you may want to query data that covers more than one company. Say a sales report for the year ordered by company? In that case a single database will be more suitable.
Also, as you add more companies are you going to have to add a new database, create tables, then have your application programs be modified to make use of these different databases? That sounds like a maintenence nightmare.
Scaling with MySQL should not be a problem as there are various replication tools that are available.
Backing up a single company should not be a problem either with a little code.

Say you have 10,000 clients and you need to change the structure of table Customers, how are you going to change all the 10K databases?

Organizing a MySQL Database

I'm developing an application that will require me to create my first large-scale MySQL database. I'm currently having a difficult time wrapping my mind around the best way to organize it. Can anyone recommend any reading materials that show different ways to organize a MySQL database for different purposes?
I don't want to try getting into the details of what I imagine the database's main components will be because I'm not confident that I can express it clearly enough to be helpful at this point. That's why I'm just looking for some general resources on MySQL database organization.

The way I learned to work these things out is to stop and make associations.
In more object oriented languages (I'm assuming you're using PHP?) that force OO, you learn to think OO very quickly, which is sort of what you're after here.
My workflow is like this:
Work out what data you need to store. (Customer name etc.)
Work out the main objects you're working with (e.g. Customer, Order, Salesperson etc), and assign each of these a key (e.g. Customer ID).
Work out which data connects to which objects. (Customer name belongs to a customer)
Work out how the main objects connect to each other (Salesperson sold order to Customer)
Once you have these, you have a good object model of what you're after. The next step is to look at the connections. For example:
Each customer has only one name.
Each product can be sold multiple times to anybody
Each order has only one salesperson and one customer.
Once you've worked that out, you want to try something called normalization, which is the art of getting this collection of data into a list of tables, still minimizing redundancy. (The idea is, a one-to-one (customer name) is stored in the table with the customer ID, many to one, one to many and many to many are stored in separate tables with certain rules)
That's pretty much the gist of it, if you ask for it, I'll scan an example sheet from my workflow for you.

Maybe I can provide some advices based on my own experience
unless very specific usage (like fulltext index), use the InnoDB tables engine (transactions, row locking etc...)
specify the default encoding - utf8 is usually a good choice
fine tune the server parameters (*key_buffer* etc... a lot of material on the Net)
draw your DB scheme by hand, discuss it with colleagues and programmers
define data types based not only on the programs usage, but also on the join queries (faster if types are equal)
create indexes based on the expected necessary queries, also to be discussed with programmers
plan a backup solution (based on DB replication, or scripts etc...)
user management and access, grant only the necessary access rights, and create a read-only user to be used by most of queries, that do not need write access
define the server scale, disks (raid?), memory, CPU
Here are also some tips to use and create a database.

I can recomend you the first chapter of this book: An Introduction to Database Systems, it may help you organize your ideas, and I certainly recomend not using 5th normal form but using 4th, this is very important.

If I could only give you one piece of advice, that would be to generate test data at similar volumes as production and benchmark the main queries.
Just make sure that the data distribution is realistic. (Not all people are named "John", and not all people have unique names. Not all people give their phone nr, and most people won't have 10 phone numbers either).
Also, make sure that the test data doesn't fit into RAM (unless you expect the production data volumes to do too).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008