I am thinking of working with couchbase for my next web application, and I am wondering how my data should be structured, specifically the use of buckets. For example, assuming each user is going to have a lot of unique data, should a separate bucket be created for each user (maybe even for different categories of data)? Also I am wondering if there is any real advantage/disadvantage to separating data using buckets (aside from the obvious organizational benefit) instead of simply storing everything in one bucket.
You will not get any performance gain from using more or less buckets. The reason that Couchbase has buckets is so that it can be multi-tenant. The best use case I can think of for using multiple buckets is if you are a hosting provider and you want to have different users using the same database server. Buckets can be password protected and would prevent one user from accessing another users data.
Some people create multiple buckets for organizational purposes. Maybe you are running two different applications and you want the data to be separate or as you mentioned maybe you want to split data by category.
In terms of management though it is probably best to create as few buckets as possible for your application since it will simplify your client logic by reducing the amount of connections you need to Couchbase from you web-tier (client). For each bucket you have you must create a separate client connection.
Related
I'm relatively new to Databases and have been looking into a solution that will allow users, under my server, to access their own data and no one else. I want these databases for the user to be scalable, so if more space is needed to store files, they can do so with little intervention.
I was looking into MySQL, as I have only done single database work with it, and was trying to see how this could potentially be done. Would the best course of action be set a database for each user? That way the tables for each database are separate, password protected, and cannot exchange data among the tables in other databases? I know there can be essentially unlimited databases and tables, in addition to table sharding/partitioning, so I think this is a solid choice, but was wondering if anyone who has worked with MySQL more had any input.
Thanks
EDIT: Update for clarification of desires. So what I essentially want is a platform where I am the owner, but I can have users log in to my platform to access their data. This data will probably mostly include files, such as PDF's, but as to their size I cannot tell, but am planning for the worst. They will be able to use a web/application to view their files and download, upload, sort, delete these files. So in addition to creating files, there will be the ability to see historic files and download those as well if desired. What my platform will be providing is the framework for these files with fields being autofilled if I can, as well as the UI for the file management. My concern comes from architecture of having multiple users, with separate data, to be kept separate, scalable, and not completely crash the server with read/writes.
It sounds like you are looking to store the users "files" as BLOBs in the database which doesn't necessarily lend itself to scaling well in the first place. Depending on the type of files generally the best solution would be to provide security in the application layer and use cloud based storage for your files. If you need an additional layer of security (i.e. users can only access the files assigned to them) there are a number of options - one such option, for example, assuming you were using S3 would be to use IAM profiles which could be generated when the user a/c is set up. The same would apply for any third party cloud storage with API.
Having an individual database per user would be an administrative nightmare unless you could generate each database on login (which would mean a separate data store for credentials anyway so it would be somewhat pointless) and also would not work in a BLOB storage scenario.
If you can detail a little more in terms of precisely what you are trying to achieve and why there will for sure be plenty of answers.
Application 1: Suppose I have a Twitter like application. Hence I need to use multiple databases/schema (suppose one to store user info, suppose one for user logging purpose, etc)
Application 2: Suppose I have a blog with logically separated DBs needed ( suppose one to store user info, suppose one for user logging purpose, etc ).
How to use same MySQL instance as the datastore for both. I mean, since each has multiple similar DBs , there are chances of getting confused with names of databases or tables unless I keep long names like twitter_users and blog_users.
Any effective solution within MySQL?
a other way is to use MaxScale as DB Proxy. There is rewrite Engine. There you can configure a rewrite for schema name for one application. The benefit is that you can use a single MySQL/MariaDB instance and configure the hole memory for it.
Little question, I'm developing a saas software (erp).
I designed it with 1 database per account for these reasons :
I make a lot of personalisation, and need to add specific table columns for each account.
Easier to manage db backup (and reload data !)
Less risky : sometimes I need to run SQL queries on a table, in case of an error with bad query (update / delete...), only one customer is affected instead of all of them.
Bas point : I'm turning to have hundreds of databases...
I'm hiring a company to manage my servers, and they said that it's better to have only one database, with a few tables, and put all data in the same tables with column as id_account. I'm very very surprised by these words, so I'm wondering... what are your ideas ?
Thanks !
Frederic
The current environment I am working in, we handle millions of records from numerous clients. Our solution is to use Schema to segregate each individual client. A schema allows you to partition your clients into separate virtual databases while inside a single db. Each schema will have an exact copy of the tables from your application.
The upside:
Segregated client data
data from a single client can be easily backed up, exported or deleted
Programming is still the same, but you have to select the schema before db calls
Moving clients to another db or standalone server is a lot easier
adding specific tables per client is easier (see below)
single instance of the database running
tuning the db affects all tenants
The downside:
Unless you manage your shared schema properly, you may duplicate data
Migrations are repeated for every schema
You have to remember to select the schema before db calls
hard pressed to add many negatives... I guess I may be biased.
Adding Specific Tables: Why would you add client specific tables if this is SAAS and not custom software? Better to use a Postgres DB with a Hstore field and store as much searchable data as you like.
Schemas are ideal for multi-tenant databases Link Link
A lot of what I am telling you depends on your software stack, the capabilities of your developers and the backend db you selected (all of which you neglected to mention)
Your hardware guys should not decide your software architecture. If they do, you are likely shooting yourself in the leg before you even get out of the gate. Get a good senior software architect, the grief they will save you, will likely save your business.
I hope this helps...
Bonne Chance
This is more of a conceptual question so variations on the stack are welcome should they be capable of accomplishing the same concept. We're currently on MySQL and expanding some services out into MongoDB.
The idea is that we would like to be able to manage a single physical database schema/structure so that adjustments, expansions etc. don't become overly cumbersome as the number of clients utilizing the structure grows into the thousands, tens of, hundreds of, etc. however we would like to segregate their data at this level rather than simply at the application layer to provide a more rigid separation. Is it possible to create virtual bins for each client using the same structure, but have their data structurally separated from one another?
The normal way would obviously be adding Client Keys to every row of data either directly or via foreign relationships, but given that we can't foresee with 20/20 how hacks on our system might occur allowing "cross client" data retrieval, I wanted to go a little further to embed the separation at a virtually structural level.
I've also read another post here: MySQL: how to do row-level security (like Oracle's Virtual Private Database)? which uses "views" as a method but this seems to become more work the larger the list of clients.
Thanks!
---- EDIT ----
Based on some of the literature suggested below, here's a little more info on our intent:
The closest situation of the three outlined in the MSDN article provided by #Stennie would be a single database, multiple-schema, however the difference being, we're not interested in customizing client schemas after their creation, we would actually prefer they remain locked to the parent/master schema.
Ideally the solution would keep each schema linked to the parent table-set structure rather than simply duplicating it with the hope that any change to the parent or master schema would be cascaded across all client/tenant schemas.
Taking it a step further, in a cluster we could have a single master with the master schema, and each slave replicating from it but with a sharded set of tenants. Changes to the master could then be filtered down through the cluster without interruption and would maintain consistency across all instances also allowing us to update the application layer faster knowing that all DB's are compatible with the updated schemas.
Hope that makes sense, I'm still a little fresh at this level.
There are a few common infrastructure approaches ranging from "share nothing" (aka multi-instance) to "share everything" (aka multi-tenant).
For example, a straightforward approach to your "virtual bins" would be to allocate a database per client using shared database servers. This is somewhere in between the two sharing extremes, as your customers would be sharing database server infrastructure but keeping their data and schema separate.
A database-per-client approach would allow you to:
manage authentication and access per client using the database's authentication & access controls
support different database software (you mention using both MySQL which supports views, and MongoDB which does not)
more easily backup and restore data per client
avoid potential cross-client leakage at a database level
avoid excessive table growth and related management issues for a single massive database
Some potential downsides would include:
having more databases to manage
in the case of a database where you want to enforce certain schema (i.e. MySQL) you will need to apply the schema changes across all your databases or support some form of versioning
in the case of a database which preallocates storage (i.e. MongoDB) you may use more storage per client (particularly if your actual data size is small)
you may run into limits on namespaces or open files
you still have to worry about application and data security :)
If you do some research on multi-tenancy you will find some other solutions ranging from this example (isolated DB per client on shared database server architecture) through to more complex partitioned data schemes.
This Microsoft article includes a useful overview of approaches and considerations: Multi-tenant SaaS database tenancy patterns.
I'm creating a multi-user/company web application in PHP & MySQL. I'm interested to know what the best practice is with regards to structuring my database(s).
There will be hundreds of companies and thousands of users of this web app so this needs to be robust. Each company won't be able to see other companies data, just their own. We will be storing mainly text data and will probably only be a few MB per company.
Currently the database contains 14 tables (for one sample company).
Is it better to put the data for all companies and their users in a single database and create a unique companyID for each one?
or:
Is it better to put each company's data in its own database and create a new database and table set for each new company that I add?
What are the pluses and minuses to each approach?
Thanks,
Stephen
If a single web app is being used by all the different companies, unless you have a very specific need or reason to use separate databases (it doesn't sound like you do), then you should definitely use a single database.
Your application will be responsible for only showing the correct information to the correct authenticated users.
Multiple databases would be a nightmare to maintain. For each new company you'd have to create and administer each one. If you make a change to one schema, you'll have to do it to your 14+.
Thousands of users and thousands of apps shouldn't pose a problem at all as long as you're using something that is a real database and not Access or something silly like that.
Multi-tenant
Pluses
Relatively easy to develop: only change database code in one place.
Lets you easily create queries which use data for multiple tenants.
Straightforward to add new tenants: no code needs to change.
Transforming a multi-tenant to a single-tenant setup is easy, should you need to change your design.
Minuses
Risk of data leak between tenants if coding is sloppy. Tenant view filters can in some cases be employed to reduce this risk. This method is based on using different database user accounts for different tenants.
If you break the code, all tenants will be affected.
Single-tenant
Pluses
If you have very different requirements for different tenants, several different database models can be beneficial. This is the best case for using a single tenant setup.
If you code sloppily, there's practically no risk of data leak between tenants (tenant A will not be able to access tenant B's data). In addition, if you accidentally destroy the schema of one tenant through a botched update, other tenants will remain unaffected.
Less SQL code when you don't need to take tenant ID values into account in your queries
Minuses
Database schemas tend to differentiate with time, often resulting in a nightmare. Using a database compare tool, you can alleviate this problem, but potentially many schemas need to be compared.
Including data from several databases in one query is typically complex, and often requires prepared statements.
Developing is hard, since you need to make the same changes to multiple schemas.
The same database entity can appear in many databases with different ID keys, resulting in confusion.
Transforming a single-tenant to a multi-tenant setup is very hard, should you need to change your design.
A single database is the relational way. One aspect from this perspective is that databases gather statistics about database usage and make heavy use of this. If you split things up you will be shooting yourself in the foot as the statistics will be fragmented.