I'm working on a School manager software in ASP that connects to an MYSQL DB. The software is working great when I deploy it in local machine for each user (SCHOOL), but I want to migrate software to AZURE cloud. The users will have an account to connect to the same app but data must not mix with other schools data. My problem is to find the best way to deploy and manage the database.
Must I Deploy 1 DB for each school
All school DATA in the same DB.
I'm not sure my solutions are the best ways.
I don't want ex STUDENT TABLE( content student for school X, for SCHOOL Y, ...)
please help to find the best solution.
There are multiple possible ways to design schema to support multi-tenant. The simplicity of the design depends on the use case.
Separate the data of every tenant (school) physically, i.e., one
schema must contain data related to only a specific tenant.
Pros:
Easy for A/B Testing. You can release updates which require database changes to some tenants and over time make it available for others.
Easy to move the database from one data-center to another. Support different SLA for backup for different customers.
Per tenant database level customization is easy. Adding a new table for customers, or modifying/adding a field becomes easy.
Third party integrations are relatively easy, e.g., connecting your data with Google Data Studio.
Scaling is relatively easy.
Retrieving data from one tenant is easy without worrying about the mixing up foreign key values.
Cons:
When you have to modify any field/table, then your application code needs to handle cases where the alterations are not completed in some databases.
Retrieving analytics across customers becomes difficult. Designing Queries for usage analysis becomes harder.
When integrating with other databases system, especially NoSQL, you will need more resources. e.g., indexing data in Elasticsearch for every tenant will require index per tenant, and if there are thousands of customers, it will result in creating thousands of shards.
Common data across tenants needs to be copied in every database
Separate data for every tenant (school) logically, i.e., one schema
contains data for all the tenants.
Pros:
Software releases are simple.
Easy to query usage analytics across multiple tenants.
Cons:
Scaling is relatively tricky. May need database sharding.
Maintaining the logical isolation of data for every tenant in all the tables requires more attention and may cause data corruption if not handled at the application level carefully.
Designing database systems for the application that support multiple regions is complicated.
Retrieving data from a single tenant is difficult. (Remember: all the records will be associated with some other records using foreign keys.)
This is not a comprehensive list. These are based on my experiences with working on both the type of designs. Both the designs are common and are used by multiple organization based on the usecase.
I have read in a lot of Online sources that one of the advantages of Graph Databases is flexible schema. But haven't found how that exactly can be achieved.
Wikipedia says 'Graphs are flexible, meaning it allows the user to insert new data into the existing graph without loss of application functionality.'But that is something we can do in a Relational Database also, at-least to an extent.
Can someone shed more light on this?
Thanks.
Edit: Giving an example to make it more clear:
Take for example a User table:
FirstName|LastName|email|Facebook|Twitter|Linkedin|
Now, some users might have FB, and not Twitter and Linkedin or vice versa. Maybe they have multiple email ids? How do you represent that?
Graph DB:
Vertices for User, FB_Link, Twitter_Link, Email.
Edges (from User) to FB, Edge to Twitter, Edge to Linkedin, Edge to Email (x 2 times) etc.
Json/DocumentDB:
{
ID:
FirstName:
LastName:
FB:
}
{
ID:
FirstName:
LastName:
Twitter:
Linkedin:
}
Notice that the documents can have different attributes.
Am I Correct in the above interpretation of Schema Flexibility? Is there more to it?
The wikipedia article is over simplifying things with the statement:
allows the user to insert new data into the existing graph without loss of application functionality
Any database allows you to insert data without losing application functionality. Rather lets focus on the flexible schema side of graph databases because here is where there is a difference.
A Brief Side Note
SQL is built on the relational model which enforces strong consistency checks between data records. It does this via enforcing locks on structural changes. Graph databases are built on the property graph model and this enforces no such relational constraints. Which means no locks (in most cases). It only enforces key-value pairs on constructs called vertices connected together via edges
With that bit context discussed lets talk about your main question:
How Is a Flexible Schema Achieved
Due to property graphs formally not having any constraint rules to satisfy in order to function you can pretty much enforce a graph representation on any data source. Technically even on a sql table with no indices if you so chose.
How this is done practically though varies from graph db to graph db. The field lacks standardisation at the moment. For example,
JanusGraph runs on different NoSql DBs such as a wide column store and a document store.
OrientDB uses a json document store.
RedisGraph uses an in-memory key-value store.
Neo4j uses it's own data model which I am not familiar with.
Note how all of the above use NoSQL DBs as the backbone. This is how a flexible schema is achieved. All these graph databases simply store the data outside of relational dbs/tables where there are "no" rules. Document stores and json being a good example.
If you curious about an example implementation of a property graph model you can checkout Neo4j's model or JanusGraphs' Docs or a generic comparison of their models
Is there anything out there (cloud JSON datastore offering, supporting mobile apps), as performant and feature rich as GCP Firestore in native mode, but provides namespaces?
The lack of this expected feature of Firestore (or any database for that matter), in native mode, is a deal killer all around, due to the nature of the software development cycle, multiple environment needs (dev, test, prod), continuous deployment and delivery pipelines, JSON data models that use namespaces, and much more.
If not, on your really big Firestore project, what are you doing to create development, test, integration environments, or areas, that people can work in, or to support seperately running related applications in production, each needing their own namespace, or set of collections defined, without having to create and manage a bazillion projects, Firestore native instances, and service accounts for each (each project/Firestore instance needs a service account .json file to be created, distributed to developers, securely stored), each additional instance adding more management overhead, and without having to run Firestore in GCP Datastore mode, in which mode, you lose all the advantages, features and main selling points, that led you to chose Firestore to support your app in the first place?
Optional Reading: Background / Context:
I recently joined a new project, was asked to create a JSON data model for various services (independently running programs), that comprise the whole, and also setup sample data for multiple runtime environments like 'dev1', 'dev2', 'test', 'prod', where the data model might be in flux, or different in 'dev' or 'test' for a period, until the next production deployment of an updated data model. I have done this in the past with GCP Datastore, and other databases of all types (NoSQL and Not NoSQL).
At the time, the JSON document store (database), had not been chosen, it was undecided. While working on the data model, and plan for multiple environments to support development efforts, was told that the datastore to be chosen was Firestore, and subsequently in the process of trying to implement the basic CRUD operations to insert sample data, and create separate sandbox areas in the same Firestore instance where 'dev1' and 'dev2' could work, and be as destructive as they want, within their own area, without affecting each other, have found that namespaces are not supported in native mode (and the client wants native mode, and what is offered there, otherwise they will look at another product for implementation).
And so now, where we thought we would need only two projects, with a Firestore instance, if we stick with Firestore in native mode across the board, we would need thirty six instances. Because of this, I am seeking input as to what is being done out there to avoid or minimize so many projects/instances. I have a solution to submit to the company, that involves not using Firestore, but thought I would ask before abandoning this. We need the ability to segregate, isolate, partition, compartmentalize data for common software development lifecycle needs, and our own application runtime needs, and all the while, in each of these environments, match the production infrastructure as much as possible.
Our need for namespaces, has nothing to do with supporting multiple clients or multitennacy, as is often cited in Google documentation I have found (as seemingly the primary purpose, and the only use case for this), and historically, that is one less frequently implemented use case of namespaces, out of many hundreds more.
We only want a max two projects and database instances (two Firestore native instances):
Production
Everything else under the sun that is not production: 'dev1',
'dev2', 'test1', 'test2', 'tmp-test-whatever'
With any database product, you should need only one database instance, and have a mechanism to support segregation and isolation of data, and data definitions, creating as many namespaces, as you want, within that database. Some database products refer to each namespace as a database. I want to distinguish here between the runtime software I am calling a "database", or "database instance", and the area where data is defined and contained (the namespace).
Oracle, PostgreSQL and others, call these segregated areas "schemas". Other data formats, XML and many more support the notion of "namespaces", so as to provide isolation and avoid data collisions of definitions with the same name.
Google Datastore supports namespaces, they say for multiple tenancy, but in such a way that each namespace is not isolated, protected as is with other database products. Any person that has access to that Datastore instance, can do anything with ALL namespaces, and there is no way to create a service account that restricts entirely, or limits access to a particular namespace.
With this Firestore backed project in production, there will be multiple separatly running services at any one time hitting, what we had hoped to be a single Firestore instance in native mode. Some will run on mobile, some will run on another VM instance (Web app initiated CRUD opertations on various collections/documents). All services are part of the same product. Some of these separate services have collections with the same name.
Example: a 'users' collection:
{ service1: <== 'service1' is the namespace, it has multiple collections, 'users' which is just one for example.
{ users:
{ user: {
login: <login_name>
<other fields>:
}
}
}
Now another name space, that also has a 'users' collection, with a different data definition, and different set of data from the above
{ service2: <== 'service2' is the name space
{ users:
{ user: {
first_name: <first_name>
last_name: <last_name>
<other fields>:
}
}
}
----
and other services that have their own collections.
Other use cases for namespaces, as I have mentioned above:
environment, like 'dev', or 'test' for example, for use in modification of any collection, such as adding, reworking the data model during development.
a unit test we want to write, that would insert data in a unique name space devoted temporarily to just that test, data would be inserted, the test would run, and at the end all data belonging to that temporary namespace would be deleted.
a namespace used by the mobile app portion of the product
a namespace to support the web app portion of the product, we are trying to use one datastore product for the entire product
a namespace environment for CI to do various things
I proposed something that would work for the data model in Firestore native mode, but it is very undesirable and kludgy, like having the service name and environment in the collection name: dev1_service1_users, dev1_service2_users, and so on to distinguish, and avoid collisions.
Firestore native gives you one namespace, they call default, but it is not a name space at all, it is a complete absence of one.
The only solution I see is to not use Firestore, but some other JSON datastore that would get us close to what Firestore native offers, a solution we would install, update and manage on a VM in the cloud, and manage all that infrastructure (load balancing, +much more).
I will post the direction we take, for anyone interested, or having a similar problem or discussion.
I am trying to structure a nosql database for the first time. I have a user table which contains: name and email address. Now each user can have more than 1 device.
Each device has multiple basically has an array of readings.
Here is what my current structure looks like:
How can i improve this structure?
ps: I am using angularjs with angularfire.
In relational databases, there is the concept of normal forms and thus related a somewhat objective measure of whether a data model is normalized.
In NoSQL databases you often end up modeling the data for the way your app consumes it. Hence there is little concept of what constitutes a good data model, without also considering the use-cases of your app.
That said: the Firebase documentation recommends flattening your data. More specifically it recommends against mixing types of data, like you are doing with user metadata and device metadata.
The recommendation would be to split them into two top-level nodes:
/users
<userid1>
email:
id:
name:
<userid2>
email:
id:
name:
/devices
<userid>
<deviceid1>
<measurement1>
<measurement2>
<deviceid2>
<measurement1>
<measurement2>
Further recommended reading is NoSQL data modeling and viewing our Firebase for SQL developers. Oh and of course, the Firebase documentation on structuring data.
We are building a large scale e-comm web site to service over 100,000 users, but we expect the number of users to grow rapidly over the first year. In general, the site functions very much like ebay where users can create, update, and remove listings. User can also search listings and purchase an item of interest. Basically, the system has transactional and non-transactional requirements:
**Transactional**
Create a listing (multi-record update)
Remove a listing
Update a listing
Purchase a listing (multi-record update)
**Non-Transactional**
Search listings
View a listing
We want to leverage the power of scalable, document-based NoSQL data stores such as Couch or MongoDB, but at the same time we need a relational store to support our ACID transactional requirements. So we have come up with a hybrid solution which uses both technologies.
Since the site is "read mostly", and, to meet the scalablity needs, we set up a MongoDB data store. For the transactional needs, we set up a MySQL Cluster. As the middleware component we use JBoss App server cluster.
When a "search" request comes in, JBoss directs the request to Mongo to handle the search which should produce very quick results while not burdening MySQL. When a listing is created, updated, removed, or purchased, JBoss services the transactions against MySQL. To keep MongoDB and MySQL synchronized, all transactional requests handled by JBoss against MySQL would include a final step in the business logic that updates the corresponding document in MongoDB via the listing id; we plan to use the MongoDB Java API to facilitate this integration of updating the document.
So, in essence, since the site is read mostly, the architecture allows us to scale out MongoDB horizontally to accommodate more users. Using MySQL allows us to leverage the ACID properties of relational databases while keeping our MongoDB store updated through the JBoss middleware.
Is there anything wrong with this architecture? No platform can offer consistency, availability, and partition-tolerance at the same time -- NoSQL systems usually give up consistency -- but at least with this hybrid approach we can realize all three at the cost of additional complexity in the system, and we are ok with that since all of our requirements are being met.
There is nothing wrong with this approach.
Infact Currently am also working on the application (E-Commerce) which leverages both SQL & NonSQL. Ours is a rails application and 90% of the data is stored in mongo and only transactional & inventory items stored in mysql. All the transactions are handled in Mysql, and everything else goes to mongo.
If you have already built it, there isn't too much wrong with the architecture aside from being a little too enterprisey. Starting from scratch on a system like this though, I'd probably leave out SQL and the middleware.
The loss of consistency in NoSQL data stores isn't as complete as you suggest. Aside from the fact that many of them do support transactions and can be set up for immediate consistency on particular queries, I suspect some of your requirements are simply an artefact of designing things relationally. Your concern seems to be around operations that require updates to multiple records - Is a listing really multiple records, or just set up that way because SQL records have to have a flat structure?
Also, if search and view are handled outside of MySQL, you have effectively set up an eventual consistency system anyway.