What are best practices for partitioning DocumentDB across accounts? - partitioning

I am developing an application that uses DocumentDB to store customer data. One of the requirements is that we segregate customer data by geographic region, so that US customers' data is stored within the US, and European customers' data lives in Europe.
The way I planned to achieve this is to have two DocumentDB accounts, since an account is associated with a data centre/region. Each account would then have a database, and a collection within that database.
I've reviewed the DocumentDB documentation on client- and server-side partitioning (e.g. 1, 2), but it seems to me that the built-in partitioning support will not be able to deal with multiple regions. Even though an implementation of IPartitionResolver could conceivably return an arbitrary collection self-link, the partition map is associated with the DocumentClient and therefore tied to a specific account.
Therefore it appears I will need to create my own partitioning logic and maintain two separate DocumentClient instances - one for the US account and one for the Europe account. Are there any other ways of achieving this requirement?

Azure's best practices on data partitioning says:
All databases are created in the context of a DocumentDB account. A
single DocumentDB account can contain several databases, and it
specifies in which region the databases are created. Each DocumentDB
account also enforces its own access control. You can use DocumentDB
accounts to geo-locate shards (collections within databases) close to
the users who need to access them, and enforce restrictions so that
only those users can connect to them.
So, if your intention is to keep the data near to user (and not just keep them stored separate) your only option is to create different accounts. Lucky that billing is not per account based but per collection based.
DocumentDB's resource model gives an impression that you can not (atleast out of the box) mix DocumentDB accounts. It doesn't look like partition keys are of any use too as partitions too can happen only within the same account.
May be this sample would help you or give some hints.

Related

Design database schema to support multi-tenant in MYSQL

I'm working on a School manager software in ASP that connects to an MYSQL DB. The software is working great when I deploy it in local machine for each user (SCHOOL), but I want to migrate software to AZURE cloud. The users will have an account to connect to the same app but data must not mix with other schools data. My problem is to find the best way to deploy and manage the database.
Must I Deploy 1 DB for each school
All school DATA in the same DB.
I'm not sure my solutions are the best ways.
I don't want ex STUDENT TABLE( content student for school X, for SCHOOL Y, ...)
please help to find the best solution.
There are multiple possible ways to design schema to support multi-tenant. The simplicity of the design depends on the use case.
Separate the data of every tenant (school) physically, i.e., one
schema must contain data related to only a specific tenant.
Pros:
Easy for A/B Testing. You can release updates which require database changes to some tenants and over time make it available for others.
Easy to move the database from one data-center to another. Support different SLA for backup for different customers.
Per tenant database level customization is easy. Adding a new table for customers, or modifying/adding a field becomes easy.
Third party integrations are relatively easy, e.g., connecting your data with Google Data Studio.
Scaling is relatively easy.
Retrieving data from one tenant is easy without worrying about the mixing up foreign key values.
Cons:
When you have to modify any field/table, then your application code needs to handle cases where the alterations are not completed in some databases.
Retrieving analytics across customers becomes difficult. Designing Queries for usage analysis becomes harder.
When integrating with other databases system, especially NoSQL, you will need more resources. e.g., indexing data in Elasticsearch for every tenant will require index per tenant, and if there are thousands of customers, it will result in creating thousands of shards.
Common data across tenants needs to be copied in every database
Separate data for every tenant (school) logically, i.e., one schema
contains data for all the tenants.
Pros:
Software releases are simple.
Easy to query usage analytics across multiple tenants.
Cons:
Scaling is relatively tricky. May need database sharding.
Maintaining the logical isolation of data for every tenant in all the tables requires more attention and may cause data corruption if not handled at the application level carefully.
Designing database systems for the application that support multiple regions is complicated.
Retrieving data from a single tenant is difficult. (Remember: all the records will be associated with some other records using foreign keys.)
This is not a comprehensive list. These are based on my experiences with working on both the type of designs. Both the designs are common and are used by multiple organization based on the usecase.

Is it a good idea to create different database for each client in SQL Server?

I have an application in which we want to provide the functionality using which user can add/update/delete the columns of different tables. My approach is to create a different database for each client so that their changes specific to tables will remain in their database.
Since each client will have their own database, I wonder how can I manage authentication and authorization? Do I need to create a different database for that as well? Will it affect the performance of the application?
Edit: The approach that I am planning to use for authentication and authorization is to create an additional field called "Account" on the login page. This account name will guide the program to connect it to correct database. And each database will have it's own users to authenticate.
The answer to your question is of course (and unfortunately) Yes and No. :)
This is known as multi-tenant data architecture.
Having separate databases can definitely be a great design option however so can using one database shared with all of your clients/customers and you will need to consider many factors before choosing.
Each design has pluses and minuses.
Here are your 3 essential choices
1) Each customer shares the same database and database tables.
2) Each customer shares the same database but they get their own schema inside the database so they each get their own set of tables.
3)Each customer gets their own database.
One major benefit (that I really like) to the separate database approach is data security. What I mean by this is that every customer gets their own database and because of this they will edit/update/delete just their database. Because of this, there is no risk in end users overriding other users data either due to programmatic error on your part or due to a security breach in your application.
When all users are in the same database you could accidentally pull and expose another customers data. Or, worse, you could expose a primary key to a record on screen and forget to secure it appropriately and a power user could override this key very easily to a key that belongs to another customer thus exposing another clients data.
However, lets say that all of your customers are actually subsidieries of 1 large company and you need to roll up financials every day/week/month/year etc.
If this is the case, then having a database for every client could be a reporting nightmare and having everyone in a single database sharing tables would just make life so much easier. When it comes time to report on your daily sales for instance, its easier to just sum up a column then go to 10,000 databases and sum them up. :)
So the answer definitely depends on your applicaton and what it will be doing.
I work on a large enterprise system where we have tens of thousands of clients in the same database and in order to support this we took very great care to secure all of our data very carefully.
I also work on a side project in my spare time which supports a database per customer multi-tenant architecture.
So, consider what your application will do, how you will backup your data, do you need to roll up data etc and this will help you decide.
Heres a grea article on MSDN for this:
https://msdn.microsoft.com/en-us/library/aa479086.aspx
Regarding your question about authentication.
Yes, having a separate database for authentication is a great design. When a customer authenticates, you will authenticate them off of your authentication database and they will receive the connectionstring to their database as part of this authentication. Then all data from that point comes from that clients database.
Hope this was helpful.
Good luck!

Multi-Tenant Database design - Database for each user

I am working on a web application that will require users to have their own set of private data. My original plan was to create a stores table, a users table, and a user_stores intersecting table. Then I would, in the stores table, save the database name for that store (and create each store-specific database with an application user and password so the web application could always login).
Each store would have similar data (users, products, shipping methods, etc), and I know I can use foreign key references to tie everything together in one giant database. However, being that the data is very specific and potentially proprietary, would it be better to use my original design, or make a single database with everyone's data in there?
I am thinking for scaling concerns, separate databases would be better because we could put the more active accounts on their own (or more powerful) database servers and simply add a server location field in the stores table if we needed to. Additionally, it may be more secure because we could make add the user login information to the database and only give them access to their data (preventing one user from editing another user's stuff). My question is, are there concerns that I am missing though? Just about every post I have read about this says not to use the method I am thinking of, and I am no DBA. Any input would be helpful.
Additional Information:
This will be hosted on a Dedicated Server that I will have root access to. I can create as many MySQL databases as I need to.
I would use a single database for sure. Use the following to get started. There are several reasons to go with a single db, however the biggest reason of all is to save you from a maintenance nightmare. If you have to change the schema, you will have a mess on your hands.
http://msdn.microsoft.com/en-us/library/aa479086.aspx
In a multi-tenant database, database designers think about querying, cost, data isolation and protection, maintenance, and disaster recovery.
Multi-tenant solutions range from one database per tenant ("shared nothing") to one row per tenant ("shared everything"). This SO answer summarizes the tradeoffs. If you're designing a database that falls under some kind of regulatory environment (HIPAA, FERPA, etc.), that regulatory environment might trump all other considerations.
One database per tenant is a defensible decision in some cases. It's not clear whether that's the best answer in your case, though.

Virtual Segregation of Data in Multi-tenant MySQL Database

This is more of a conceptual question so variations on the stack are welcome should they be capable of accomplishing the same concept. We're currently on MySQL and expanding some services out into MongoDB.
The idea is that we would like to be able to manage a single physical database schema/structure so that adjustments, expansions etc. don't become overly cumbersome as the number of clients utilizing the structure grows into the thousands, tens of, hundreds of, etc. however we would like to segregate their data at this level rather than simply at the application layer to provide a more rigid separation. Is it possible to create virtual bins for each client using the same structure, but have their data structurally separated from one another?
The normal way would obviously be adding Client Keys to every row of data either directly or via foreign relationships, but given that we can't foresee with 20/20 how hacks on our system might occur allowing "cross client" data retrieval, I wanted to go a little further to embed the separation at a virtually structural level.
I've also read another post here: MySQL: how to do row-level security (like Oracle's Virtual Private Database)? which uses "views" as a method but this seems to become more work the larger the list of clients.
Thanks!
---- EDIT ----
Based on some of the literature suggested below, here's a little more info on our intent:
The closest situation of the three outlined in the MSDN article provided by #Stennie would be a single database, multiple-schema, however the difference being, we're not interested in customizing client schemas after their creation, we would actually prefer they remain locked to the parent/master schema.
Ideally the solution would keep each schema linked to the parent table-set structure rather than simply duplicating it with the hope that any change to the parent or master schema would be cascaded across all client/tenant schemas.
Taking it a step further, in a cluster we could have a single master with the master schema, and each slave replicating from it but with a sharded set of tenants. Changes to the master could then be filtered down through the cluster without interruption and would maintain consistency across all instances also allowing us to update the application layer faster knowing that all DB's are compatible with the updated schemas.
Hope that makes sense, I'm still a little fresh at this level.
There are a few common infrastructure approaches ranging from "share nothing" (aka multi-instance) to "share everything" (aka multi-tenant).
For example, a straightforward approach to your "virtual bins" would be to allocate a database per client using shared database servers. This is somewhere in between the two sharing extremes, as your customers would be sharing database server infrastructure but keeping their data and schema separate.
A database-per-client approach would allow you to:
manage authentication and access per client using the database's authentication & access controls
support different database software (you mention using both MySQL which supports views, and MongoDB which does not)
more easily backup and restore data per client
avoid potential cross-client leakage at a database level
avoid excessive table growth and related management issues for a single massive database
Some potential downsides would include:
having more databases to manage
in the case of a database where you want to enforce certain schema (i.e. MySQL) you will need to apply the schema changes across all your databases or support some form of versioning
in the case of a database which preallocates storage (i.e. MongoDB) you may use more storage per client (particularly if your actual data size is small)
you may run into limits on namespaces or open files
you still have to worry about application and data security :)
If you do some research on multi-tenancy you will find some other solutions ranging from this example (isolated DB per client on shared database server architecture) through to more complex partitioned data schemes.
This Microsoft article includes a useful overview of approaches and considerations: Multi-tenant SaaS database tenancy patterns.

what is the proper way to separate data in couchbase

I am thinking of working with couchbase for my next web application, and I am wondering how my data should be structured, specifically the use of buckets. For example, assuming each user is going to have a lot of unique data, should a separate bucket be created for each user (maybe even for different categories of data)? Also I am wondering if there is any real advantage/disadvantage to separating data using buckets (aside from the obvious organizational benefit) instead of simply storing everything in one bucket.
You will not get any performance gain from using more or less buckets. The reason that Couchbase has buckets is so that it can be multi-tenant. The best use case I can think of for using multiple buckets is if you are a hosting provider and you want to have different users using the same database server. Buckets can be password protected and would prevent one user from accessing another users data.
Some people create multiple buckets for organizational purposes. Maybe you are running two different applications and you want the data to be separate or as you mentioned maybe you want to split data by category.
In terms of management though it is probably best to create as few buckets as possible for your application since it will simplify your client logic by reducing the amount of connections you need to Couchbase from you web-tier (client). For each bucket you have you must create a separate client connection.