providing the right sequelize.js object to connect to the database(multi-tenant application) - mysql

First some Background:
We are trying to create a multi-tenant application, we thought of going on with using the mean stack first and create multiple collection for each tenant (eg order_tenant1,order_tenant2 etc) then we went through some blogs that suggested against this approach, second we felt the need of transaction as a core requirement of our DB thus opened our self's to RDBMS lke mysql and mariaDB, we stumbled upon a blog which explained the approach in a lot of detail which says to create views to get, update and insert data related to tenant and views parameter would be defined thorugh the connection string as we are using node.js i found ORM for mysql sequelizejs which is quite good.
The actual problem:
As per my experience of the mean stack we define the mongo connection in the server.js file and the application establishes those connection at the application start and keeps them alive,
how can i have multiple sequelizejs (or for that matter and database connection )objects to connect to the database according to the user belonging to a particular tenant and provide the right object to the application to carry on with the business logic
1)should i create a new connection object on every request the application get and then close it after the request is processed ?
2)or is there any better way to handle this in node, express or sequelizejs!?
Edited:
we have decided to use row-base approch containing the tenant_id as a column as said in the blog above, but i am struggling about how would i maintain dirrent connection object to the database through sequelizejs objects i.e id a user belonging to tenant id:1 sends a request to the application he need to be serverd with an object say "db" which is a sequelize object to communicate with the database which is created using tenant id 1's details in its connection string, same for a user belonging tenant id:2 it needs to be served with the same object i.e. "db" but it must be created using tenant id 2's details in its connection string as i want to maintain different connection string (database connection objects) for every tenant i have to serve.

Multi-tenancy can be implemented as row-based, schema-based, or database-based. Other than that 2010 article, I think you will find few if any other recommendations for doing database-backed multi-tenancy. Systems were not designed to talk to tens or thousands of databases, and so things will keep failing on you. The basic thing you're trying to avoid is SQL injection attacks that reveal other user's data, but the proper way to avoid those is through sanitizing user inputs, which you need to do no matter what.
I highly recommend going with a normal row-based multi-tenancy approach as opposed to schema-based, as described in https://devcenter.heroku.com/articles/heroku-postgresql#multiple-schemas and in this original article: http://railscraft.tumblr.com/post/21403448184/multi-tenanting-ruby-on-rails-applications-on-heroku
Updated:
Your updated question still isn't clear about the difference between database-based and row-based multi-tenancy. You want to do row-based. Which means that you can setup a single Sequelize connection string exactly like the examples, since you'll only have a single database.
Then, your queries to the database will look like:
User.find({ userid: 538 }).complete(function(err, user) {
console.log(user.values)
})
The multi-tenancy is provided by the userid attribute. I would urge you to do a lot more reading about databases, ORM, and typical patterns before getting started. I think you will find an additional up front investment to pay dividends versus starting development when you don't fully understand how ORMs typically work.

Related

Is it possible to store client data on separate databases?

We have a client who is determined to keep their data in our cloud VM separate from other client data. That is, we have a centralized MySQL database where we store all of our client data and access the data depending on the id etc. The clients are now requesting that their data is separated one from the other. Meaning that if the database is hacked the hacker can't jump from one users data to see the others. I have never heard of this type of functionality especially for MySQL databases (you can create users and allocate to tables but not to specific data in a table) as far as I know. Possibly this is a functionality of Azure databases or something.
Has anyone encountered something like this request/solution?
Thanks
I did work for a notification service. We stored each client's data in a separate schema, but on the same MySQL instance. The reason was to keep PII (Personally Identifiable Information) separate, so on any given application request, it was not possible that it could accidentally read data for another client.
The application first connected to a special schema that stored a table listing all the client schemas and the username & password for each client schema. The app reads this table to query for one specific client, then opens a new connection using that username & password.
It added a little bit of overhead to every session to do this two-step connection, but it wasn't too much.
I'm not sure how this eliminates the possibility of being hacked. That's still a risk. If an attacker hacks the primary database, why couldn't they also hack the specific client's database?

How to store data of different applications in same local MySQL instance if both applications have multi-DB architecture?

Application 1: Suppose I have a Twitter like application. Hence I need to use multiple databases/schema (suppose one to store user info, suppose one for user logging purpose, etc)
Application 2: Suppose I have a blog with logically separated DBs needed ( suppose one to store user info, suppose one for user logging purpose, etc ).
How to use same MySQL instance as the datastore for both. I mean, since each has multiple similar DBs , there are chances of getting confused with names of databases or tables unless I keep long names like twitter_users and blog_users.
Any effective solution within MySQL?
a other way is to use MaxScale as DB Proxy. There is rewrite Engine. There you can configure a rewrite for schema name for one application. The benefit is that you can use a single MySQL/MariaDB instance and configure the hole memory for it.

Multi-tenant Django applications: altering database connection per request?

I'm looking for working code and ideas from others who have tried to build a multi-tenant Django application using database-level isolation.
Update/Solution: I ended solving this in a new opensource project: see django-db-multitenant
Goal
My goal is to multiplex requests as they come in to a single app server (WSGI frontend like gunicorn), based on the request hostname or request path (for instance, foo.example.com/ sets the Django connection to use database foo, and bar.example.com/ uses database bar).
Precedent
I'm aware of a few existing solutions for multi tenancy in Django:
django-tenant-schemas: This is very close to what I want: you install its middleware at highest precedence, and it sends a SET search_path command to the db. Unfortunately, it is Postgres specific and I am stuck with MySQL.
django-simple-multitenant: The strategy here is to add a "tenant" foreign key to all models, and adjust all application business logic to key off of that. Basically each row is becomes indexed by (id, tenant_id) rather than (id). I've tried, and don't like, this approach for a number of reasons: it makes the application more complex, it can lead to hard-to-find bugs, and it provides no database-level isolation.
One {app server, django settings file with appropriate db} per tenant. Aka poor man's multi tenancy (actually rich man's, given the resources it involves). I do not want to spin up a new app server per tenant, and for scalability I want any app server to be able to dispatch requests for any client.
Ideas
My best idea so far is to do something like django-tenant-schemas: in the first middleware, grab django.db.connection and fiddle with the database selection rather than the schema. I haven't quite thought through what this means in terms of pooled/persistent connections
Another dead end I pursued was tenant-specific table prefixes: Setting aside that I'd need them to be dynamic, even a global table prefix is not easily achieved in Django (see rejected ticket 5000, among others).
Finally, Django multiple database support lets you define multiple named databases, and mux among them based on the instance type and read/write mode. Not helpful since there is no facility to select the db on a per-request basis.
Question
Has anyone managed something similar? If so, how did you implement it?
I've done something similar that is closest to point 1, but instead of using middleware to set a default connection Django database routers are used. This allow application logic to use a number of databases if required for each request. It's up to the application logic to choose a suitable database for every query, and this is the big downside of this approach.
With this setup, all databases are listed in settings.DATABASES, including databases which may be shared among customers. Each model that is customer specific is placed in a Django app that has a specific app label.
eg. The following class defines a model which exists in all customer databases.
class MyModel(Model):
....
class Meta:
app_label = 'customer_records'
managed = False
A database router is placed in the settings.DATABASE_ROUTERS chain to route database request by app_label, something like this (not a full example):
class AppLabelRouter(object):
def get_customer_db(self, model):
# Route models belonging to 'myapp' to the 'shared_db' database, irrespective
# of customer.
if model._meta.app_label == 'myapp':
return 'shared_db'
if model._meta.app_label == 'customer_records':
customer_db = thread_local_data.current_customer_db()
if customer_db is not None:
return customer_db
raise Exception("No customer database selected")
return None
def db_for_read(self, model, **hints):
return self.get_customer_db(model, **hints)
def db_for_write(self, model, **hints):
return self.get_customer_db(model, **hints)
The special part about this router is the thread_local_data.current_customer_db() call. Before the router is exercised, the caller/application must have set up the current customer db in thread_local_data. A Python context manager can be used for this purpose to push/pop a current customer database.
With all of this configured, the application code then looks something like this, where UseCustomerDatabase is a context manager to push/pop a current customer database name into thread_local_data so that thread_local_data.current_customer_db() will return the correct database name when the router is eventually hit:
class MyView(DetailView):
def get_object(self):
db_name = determine_customer_db_to_use(self.request)
with UseCustomerDatabase(db_name):
return MyModel.object.get(pk=1)
This is quite a complex setup already. It works, but I'll try to summarize what I see see as advantages and disadvantages:
Advantages
Database selection is flexible. It allows multiple database to be used in a single query, both customer specific and shared databases can be used in a request.
Database selection is explicit (not sure if this is an advantage or disadvantage). If you try to run a query that hits a customer database but the application hasn't selected one, an exception will occur indicating a programming error.
Using a database router allows different databases to exist on different hosts, rather than relying on a USE db; statement that guesses that all databases are accessible through a single connection.
Disadvantages
It's complex to setup, and there are quite a few layers involved to get it functioning.
The need and use of thread local data is obscure.
Views are littered with database selection code. This could be abstracted using class based views to automatically choose a database based on request parameters in the same manner as middleware would choose a default database.
The context manager to choose a database must be wrapped around a queryset in such a manner that the context manager is still active when the query is evaluated.
Suggestions
If you want flexible database access, I'd suggest to use Django's database routers. Use Middleware or a view Mixin which automatically sets up a default database to use for the connection based on request parameters. You might have to resort to thread local data to store the default database to use so that when the router is hit, it knows which database to route to. This allows Django to use its existing persistent connections to a database (which may reside on different hosts if wanted), and chooses the database to use based on routing set up in the request.
This approach also has the advantage that the database for a query can be overridden if needed by using the QuerySet using() function to select a database other than the default.
For the record, I chose to implement a variation of my first idea: issue a USE <dbname> in an early request middleware. I also set the CACHE prefix the same way.
I'm using it on a small production site, looking up the tenant name from a Redis database based on the request host. So far, I'm quite happy with the results.
I've turned it into a (hopefully resuable) github project here: https://github.com/mik3y/django-db-multitenant
You could create a simple middleware of your own that determined the database name from your sub-domain or whatever and then executed a USE statement on the database cursor for each request. Looking at the django-tenants-schema code, that is essentially what it is doing. It is sub-classing psycopg2 and issuing the postgres equivalent to USE, "set search_path XXX". You could create a model to manage and create your tenants too, but then you would be re-writing much of django-tenants-schema.
There should be no performance or resource penalty in MySQL to switching the schema (db name). It is just setting a session parameter for the connection.

MongoDB - proper use of collections?

In Mongo my understanding is that you can have databases and collections. I'm working on a social-type app that will have blogs and comments (among other things) and had previously be using MySQL and pretty heavy partitioning in an attempt to limit possible concurrency issues.
With MySQL I've stuffed all my user data into a _user database with several tables to further partition the data (blogs, pages, etc).
My immediate reaction with Mongo would be to create a 'users' database with one collection per user. In this way user 'zach' blog entries would go into the 'zach' collection with associated comments and such becoming sub-objects in the same collection. Basically like dynamically creating one table per user in MySQL, but apparently without the complexity and limitations that might impose.
Of course since I haven't really used Mongo before I'm having trouble gauging the (ahem..) quality of this idea and the potential problems it might cause down the road.
I'd like user data to be treated a lot like a users directory in a *nix environment where user created/non-shared (mostly) gets put into one place (currently with MySQL that would be the appname_users as mentioned above).
Most of the users data will be specific to the users page(s). Some of the user data which is queried across all site users (searchable user profiles) is currently kept in a separate database/table and I expect things like this could be put into a appname_system database and be broken up into collections and/or application specific databases (appname_profiles).
Anyway, since the available documentation on this is currently a little thin and my experience is extremely limited I thought I might find a little guidance from someone with a better working understanding of the system.
On the plus side I'd really already been attempting to treat MySQL as a schema-less document-store and doing this with Mongo seems much more intuitive/sane/rational so I'm really looking forward to getting started.
Thanks,
Zach
I have the same kind of application.
Some things to consider: you can cross query between collection bu not between databases.
So It's probably better to have a database with all you data and then a collection for each Object.
Then each document can contain any kind and number of fields.
I tried to avoid embedding arrays b/c I had trouble query properly my object (it was working fine, but the architecture of my system was designed for this use)
And a database can be shared between several sever automatically so space is not an issue (if you have more than 1 server)

Multiple database connections in Rails

I'm writing a simpler version of phpMyAdmin in Rails; this web app will run on a web server (where users will be able to indicate the database name, hostname, username, password, and port number of one of the database servers running on the same network). The user will then be connected to that machine and will be able to use the UI to administer that database (add or remove columns, drop tables, etc).
I have two related questions (your help will greatly aid me in understanding how to best approach this):
In a traditional Rails application I would store the database info in database.yml, however here I need to do it dynamically. Is there a good way to leave the database.yml file empty and tell Rails to use the connection data provided by the user at run time instead?
Different users may connect to different databases (or even hosts). I assume that I need to keep track of the association between an established database connection and a user session. What's the best way to achieve this?
Thank you in advance.
To prevent Rails from initializing ActiveRecord using database.yml, you can simply remove :active_record from config.frameworks in config/environment.rb. Then, to manually establish connections, you use ActiveRecord::Base.establish_connection. (And maybe ActiveRecord::Base.configurations)
ActiveRecord stores everything connection related in class variables. So if you want to dynamically create multiple connections, you also have to dynamically subclass ActiveRecord::Base and call establish_connection on that.
This will be your abstract base class for any subclass you'll use to actually manage tables. To make ActiveRecord aware of this, you should do self.abstract_class = true within the base class definition.
Then, each table you want to manage will in turn dynamically subclass this new abstract base class.
This is more difficult, because you can't really persist connections, of course. The immediate solution I can think of is storing a unique token in the session, and use that in a before_filter to get back to the dynamic ActiveRecord::Base subclass, which you'll probably be storing in a hash somewhere.
This gets more interesting once you start running multiple Rails worker processes:
You will have to store all of the database connection information in the session, so other workers can use it.
You probably want a consistent unique token across workers, so use a hash function on a combination of database connection parameters.
Because a worker may be called with a token it doesn't yet know about, your subclassing and establish_connection logic will probably happen in the before_filter. (Rather than the moment of login, for example.)
You will have to figure out some clever way of garbage collecting connections and classes, for when user doesn't properly log out and the session expires. (Sorry, I don't know this one.)