Segregating sandbox environment - mysql

For a site that is using a Sandbox mode, such as a Payment site, would a separate database be used, or the same one?
I am examining two schemas for the production and sandbox environment. Here are the two options.
OPTION 1:
Clone database, route requests to the correct database based upon sandbox mode.
OPTION 2
Single database, 'main tables' have an is_sandbox boolean.
What would be the pros and cons of each method?

In most situations, you'd want to keep two separate databases. There's no good reason to have the two intermingled in the same database, and a lot of very good reasons to keep them separated:
Keeping track of which entities are in which "realm" (production vs. sandbox) is extra work for your code, and you'll likely have to include it in a lot of places.
You'll need that logic in your database schema as well. UNIQUE indexes all have to include the realm, for instance.
If you forget any of that code, you've got yourself a potential security vulnerability. A malicious user could cause data from one realm to influence the other. Depending on what your application is, this could range anywhere from annoying to terrifying. (If it's a payment application, for instance, the potential consequences are incredibly dire: "pretend" money from the sandbox could be converted into real money!)
Even if your code is all perfect, there'll still be some information unavoidably leaked between the realms. For instance, if your application uses any sequential identifiers (AUTO_INCREMENT in MySQL, for instance), gaps in values seen in the sandbox will correspond with values used in production. Whether this matters is debatable, though.
Using two separate databases neatly solves all these problems. It also means you can easily clean out the sandbox when needed.
Exception: If your application is an almost entirely public web site (e.g, like Stack Overflow or Wikipedia), or involves social aspects that are difficult to replicate in a sandbox (like Facebook), more integrated sandboxes may make more sense.

Related

MySQL Multiple Databases for User Authentication

I have a few related web sites and it seems rather unfortunate that they have completely separate user databases. I've been contemplating different options on how to unify the databases:
Rework the sites to be running on one copy of my content management system rather than independent software. Pros: Seems clean. Cons: Complicated by needing to rewrite a lot of the backend of one of the sites to support the different features of the other site.
Use the OAuth backend I wrote to interface with Facebook to authenticate back and forth between the sites. Pros: Seems to be using OAuth for what it was meant to do. Cons: it requires at least some redundancy where I'd need to store duplicate user data on both sites and this could get out of sync. Also seems like overkill for two sites running on the same server.
Connect to both databases whenever an account is created or modified on either site and apply the modifications to the other site. Pros: seems to avoid risk of falling out of sync and avoids complications of having to create and receive OAuth data between the sites. Cons: it requires full duplication of user information between the sites.
Choose one of the sites as having the canonical database and have the user authentication mechanism of the other site connect to the first site's MySQL database, while still connecting to a separate database for the rest of the site's functionality.
I'm not totally happy with any of the options, although #4 feels like the simplest to implement as I'm thinking about it. Nonetheless, before I embark on such a project, I thought I'd ask for potential pitfalls I might be overlooking since none of the ideas are entirely trival. I'd appreciate advice on which might be considered "best practices" and, perhaps, more importantly, which one would cause the most impact on server resources. I'm using Perl's DBD::MySQL to interact with the databases.

Virtual Segregation of Data in Multi-tenant MySQL Database

This is more of a conceptual question so variations on the stack are welcome should they be capable of accomplishing the same concept. We're currently on MySQL and expanding some services out into MongoDB.
The idea is that we would like to be able to manage a single physical database schema/structure so that adjustments, expansions etc. don't become overly cumbersome as the number of clients utilizing the structure grows into the thousands, tens of, hundreds of, etc. however we would like to segregate their data at this level rather than simply at the application layer to provide a more rigid separation. Is it possible to create virtual bins for each client using the same structure, but have their data structurally separated from one another?
The normal way would obviously be adding Client Keys to every row of data either directly or via foreign relationships, but given that we can't foresee with 20/20 how hacks on our system might occur allowing "cross client" data retrieval, I wanted to go a little further to embed the separation at a virtually structural level.
I've also read another post here: MySQL: how to do row-level security (like Oracle's Virtual Private Database)? which uses "views" as a method but this seems to become more work the larger the list of clients.
Thanks!
---- EDIT ----
Based on some of the literature suggested below, here's a little more info on our intent:
The closest situation of the three outlined in the MSDN article provided by #Stennie would be a single database, multiple-schema, however the difference being, we're not interested in customizing client schemas after their creation, we would actually prefer they remain locked to the parent/master schema.
Ideally the solution would keep each schema linked to the parent table-set structure rather than simply duplicating it with the hope that any change to the parent or master schema would be cascaded across all client/tenant schemas.
Taking it a step further, in a cluster we could have a single master with the master schema, and each slave replicating from it but with a sharded set of tenants. Changes to the master could then be filtered down through the cluster without interruption and would maintain consistency across all instances also allowing us to update the application layer faster knowing that all DB's are compatible with the updated schemas.
Hope that makes sense, I'm still a little fresh at this level.
There are a few common infrastructure approaches ranging from "share nothing" (aka multi-instance) to "share everything" (aka multi-tenant).
For example, a straightforward approach to your "virtual bins" would be to allocate a database per client using shared database servers. This is somewhere in between the two sharing extremes, as your customers would be sharing database server infrastructure but keeping their data and schema separate.
A database-per-client approach would allow you to:
manage authentication and access per client using the database's authentication & access controls
support different database software (you mention using both MySQL which supports views, and MongoDB which does not)
more easily backup and restore data per client
avoid potential cross-client leakage at a database level
avoid excessive table growth and related management issues for a single massive database
Some potential downsides would include:
having more databases to manage
in the case of a database where you want to enforce certain schema (i.e. MySQL) you will need to apply the schema changes across all your databases or support some form of versioning
in the case of a database which preallocates storage (i.e. MongoDB) you may use more storage per client (particularly if your actual data size is small)
you may run into limits on namespaces or open files
you still have to worry about application and data security :)
If you do some research on multi-tenancy you will find some other solutions ranging from this example (isolated DB per client on shared database server architecture) through to more complex partitioned data schemes.
This Microsoft article includes a useful overview of approaches and considerations: Multi-tenant SaaS database tenancy patterns.

one big database, or one per client?

I've been asked to develop an application that will be run out to a number of business units. the application will be the basically the same for each unit, but will have minor procedural differences, which won't change the structure of the underlying database. Should I use one database per business unit, or one big database for all the units? The business units are totally separate
My preference is for one database per client. The advantages:
if a client gets too big, they're easy to move - backup, restore, change the connection string, boom. Try doing that when their data is mixed in with others in a massive database. Even if you use schemas and filegroups to segregate, moving them is not a cakewalk.
ditto for deleting a client's data when they move on.
by definition you're keeping each client's data separate. This is often going to be a want, and sometimes a need. Sometimes it will even be legally binding.
all of your code within a database is simpler - it doesn't have to include the client's schema (which can't be parameterized) and your tables don't have to be littered with an extra column indicating the client.
A lot of people will claim that managing 200 or 500 databases is a lot harder than managing 10 databases. It's not really any different, in my experience. You build scripts that automate things, you stagger index maintenance and backup jobs, etc.
The potential disadvantages are when you get up into the realm of 4-digit and higher databases per instance, where you want to start thinking about having multiple servers (the threshold really depends on the workload and the hardware, so I'm just picking a number). If you build the system right, adding a second server and putting new databases there should be quite simple. Again, the app should be aware of each client's connection string, and all you're doing by using different servers is changing the instance the connection string points to.
Some questions over on dba.SE you should look at. They're not all about SQL Server, but many of the concepts and challenges are universal:
https://dba.stackexchange.com/questions/16745/handling-growing-number-of-tenants-in-multi-tenant-database-architecture
https://dba.stackexchange.com/questions/5071/what-are-the-performance-implications-of-running-multiple-smaller-dbs-instead-of
https://dba.stackexchange.com/questions/7924/one-big-database-vs-several-smaller-ones
Your question is a design question. In order to answer it, you need to understand the requirements of the system that you want to build. From a technical perspective, SQL Server -- or really any database -- can handle either scenario.
Here are some things to think about.
The first question is how separate your clients need the data to be. Mixing data together from different business units may not be legal in some cases (say, the investment side of a bank and the market analysis side). In such situations, separate databases are the solution.
The next question is security. In some situations, clients might be very uncomfortable knowing that their data is intermixed with other clients data. A small slip-up, and confidential information is inadvertently shared. This is probably not an issue for different business units in the same company.
Do you have to deal with different uptime requirements, upload requirements, customizations, and perhaps interaction with other tools? If one business unit will need customizations ASAP that other business units are not interested in, then that suggests different databases.
Another consideration is performance. Does this application use a lot of expensive resources? If so, being able to partition the application on different databases -- and potentially different servers -- may be highly desirable.
On the other hand, if much of the data is shared, and the repository is really a central repository with the same underlying functionality, then one database is a good choice.

Multiple Domains Site Design Decision

i am developing a project that its domain is meaningful in my native language. So i bought a second English domain for global usage.
My question is, how should i construct my site?
Two different projects or one project with localization support?
Two different databases or shared database?
What is my goal?
Dont want to show English content in native site, vice versa
I want to easily update site
If you suggest me to use shared database, could you please describe me design principle of database?
Thank You.
Typically for application code you ideally want to not fork for any reason including language. There are some quick things you need to watch out for;
Ensure that strings are not hardcoded
Store all datetimes in UTC
Ensure that all user profiles have an associated timezone (you can grab this from the user's browser
Try to ensure that your presentation is separate from your page content (i.e. use CSS, Master Pages, Templates or whatever your platform supports).
As for the database this depends more on the data your holding, for example if;
You want users to share logins across both sites
Knowledge to be shared but not necessarily localized (Wiki Entries)
The sites are managing a shared resource (i.e. a single warehouse)
You might want to have one database.
However if you find the following are true;
You don't want/need users between the sites to have cross over (think amazon.com and amazon.co.uk)
Knowledge is wholly separate with entries in one language being irrelevant to the other
The sites are managing wholly separate resources (i.e. two separate warehouses)
You might lean towards two separate databases. This will give you an advantage in scaling (though its not a silver bullet) and as long as the schemas are identical across the databases you will likely find that it's not too onerous.
One other option is to identify shared resources and split them into another repository (think user logins etc...). This can get you the best of both worlds but of course is a more complex design.
Remember all of this can be added after the fact it just becomes harder. Sometimes it's more important to get to market than it is to try and solve all your problems up front.
Good Luck!
I'm not quite sure what could work for you, but I think that localization support would it be nice, and if you have a shared database you won't need to support to different databases and you won't need to add an extra database anytime you need to add a new language, and thinking about the application it would be easier just to if you want another language just to add it to your configuration and not create another project just to add that.

Is it a good practice to put the tables of different versions of a website(no data sharing among these versions) in one database?

I am developing a website. There is an English version, Japanese version and Chinese version. Different version is for different language speakers. If you are a registered user of the English version, and you want to use the Japanese version, you still need to register on the Japanese version. So should I create one database and put all tables into this database or should I create 3 databases, each database for each version?
If these sites share no data I would say it's better to create a separate database for each. This will prevent you from accidentally damaging other version's tables if you mess up any queries.
make the tables reasonably separate, but don't close the door to possible future requirements. databases in mysql are a fine mechanism that fits both: it's a nice way to namespace the tables, and the separation is weak so you won't have problems with cross-database queries. use schemas in more sophisticated database systems.
It depends as RaYell tells on the amount of data/tables shared among these different versions. I would recommend that you look into schema support for your particular database, and partition according to schema for data separation, and by different users owning the separate schema's for security access.
In Oracle database, for example, each user is assigned it's own schema, so you could have user_en, user_jp.
Alternatively you could look into multilingual database design.
It really depends on how much data is to be shared (or combined for reporting). Even if management say "no, everything is separate" now, that'll change in 5 minutes. Always. :-)
I've worked on a number of multi-tenant systems, and would recommend a single database, designed so each site has its own ID; the negative side is the SiteID column must then be included in most of the tables, foreign keys and the associated queries. On the positive side it does allow a site's data to be extracted easily if one site is sold off, or its server is moved to a separate location for legal reasons, etc.
I'd also recommend using Unicode (or UTF-8) for all user-visible or possibly-localizable data. It'll save a lot of grief later on.
Definitely it is better to have separate databases, otherwise you will have to come up with different naming conventions for tables etc. If you have code that accesses these tables, then you will need to modify all that code as well instead of just reconfiguring the database bindings.
The answer, as usual, is "it depends." The real question, I think, is how you plan on maintaining your system.
If you are going to have a single website that allows the user to select language (or have different versions appear at different URLs), then I would use a single database, a single set of application scripts, etc. This way minor changes in schema only need to be reflected in one database. Each table with user content would have some kind of column with a SiteID column, much as devstuff recommends. A second advantage to this approach is that you can have a single user authentication system and actually let users switch from one system to another --- or eventually fuse them all together.
If you are going to have multiple applications, multiple programmers, multiple skins, etc., you may find it easier to have multiple databases. But this means that you will also have dramatically higher development costs. In some cases this is worth the trouble; in most cases it is not.