Firebase/NoSQL Database Structure - json

I am trying to structure a nosql database for the first time. I have a user table which contains: name and email address. Now each user can have more than 1 device.
Each device has multiple basically has an array of readings.
Here is what my current structure looks like:
How can i improve this structure?
ps: I am using angularjs with angularfire.

In relational databases, there is the concept of normal forms and thus related a somewhat objective measure of whether a data model is normalized.
In NoSQL databases you often end up modeling the data for the way your app consumes it. Hence there is little concept of what constitutes a good data model, without also considering the use-cases of your app.
That said: the Firebase documentation recommends flattening your data. More specifically it recommends against mixing types of data, like you are doing with user metadata and device metadata.
The recommendation would be to split them into two top-level nodes:
/users
<userid1>
email:
id:
name:
<userid2>
email:
id:
name:
/devices
<userid>
<deviceid1>
<measurement1>
<measurement2>
<deviceid2>
<measurement1>
<measurement2>
Further recommended reading is NoSQL data modeling and viewing our Firebase for SQL developers. Oh and of course, the Firebase documentation on structuring data.

Related

How does Graph Database achieve Flexible Schema?

I have read in a lot of Online sources that one of the advantages of Graph Databases is flexible schema. But haven't found how that exactly can be achieved.
Wikipedia says 'Graphs are flexible, meaning it allows the user to insert new data into the existing graph without loss of application functionality.'But that is something we can do in a Relational Database also, at-least to an extent.
Can someone shed more light on this?
Thanks.
Edit: Giving an example to make it more clear:
Take for example a User table:
FirstName|LastName|email|Facebook|Twitter|Linkedin|
Now, some users might have FB, and not Twitter and Linkedin or vice versa. Maybe they have multiple email ids? How do you represent that?
Graph DB:
Vertices for User, FB_Link, Twitter_Link, Email.
Edges (from User) to FB, Edge to Twitter, Edge to Linkedin, Edge to Email (x 2 times) etc.
Json/DocumentDB:
{
ID:
FirstName:
LastName:
FB:
}
{
ID:
FirstName:
LastName:
Twitter:
Linkedin:
}
Notice that the documents can have different attributes.
Am I Correct in the above interpretation of Schema Flexibility? Is there more to it?
The wikipedia article is over simplifying things with the statement:
allows the user to insert new data into the existing graph without loss of application functionality
Any database allows you to insert data without losing application functionality. Rather lets focus on the flexible schema side of graph databases because here is where there is a difference.
A Brief Side Note
SQL is built on the relational model which enforces strong consistency checks between data records. It does this via enforcing locks on structural changes. Graph databases are built on the property graph model and this enforces no such relational constraints. Which means no locks (in most cases). It only enforces key-value pairs on constructs called vertices connected together via edges
With that bit context discussed lets talk about your main question:
How Is a Flexible Schema Achieved
Due to property graphs formally not having any constraint rules to satisfy in order to function you can pretty much enforce a graph representation on any data source. Technically even on a sql table with no indices if you so chose.
How this is done practically though varies from graph db to graph db. The field lacks standardisation at the moment. For example,
JanusGraph runs on different NoSql DBs such as a wide column store and a document store.
OrientDB uses a json document store.
RedisGraph uses an in-memory key-value store.
Neo4j uses it's own data model which I am not familiar with.
Note how all of the above use NoSQL DBs as the backbone. This is how a flexible schema is achieved. All these graph databases simply store the data outside of relational dbs/tables where there are "no" rules. Document stores and json being a good example.
If you curious about an example implementation of a property graph model you can checkout Neo4j's model or JanusGraphs' Docs or a generic comparison of their models

Storing Visualizations and Analysis in Database

I am currently working on a web-application that would allow users to analyze & visualize data. For example, one of the use-cases is that the user will perform a Principal Component Analysis and store it. There can be other such analysis like a volcano plot, heatmap etc.
I would like to store these analysis and visualizations in a database in the back-end. The challenge that I am facing is how to design a relational database schema which will do this efficiently. Here are some of my concerns:
The data associated with the project will already be stored in a normalized manner so that it can be recalled. I would not like to store it again with the visualization.
At the same time, the user should be able to see what is the original data behind a visualization. For eg. what data was fed to a PCA algorithm? The user might not use all the data associated with the project for the PCA. He/she could just be doing this on a subset of the data in the project.
The number of visualizations associated with the webapp will grow with time. If I need to design an invoved schema everytime a new visualization is added, it could make overall development slower.
With these in mind, I am wondering if I should try to solve this with a relational database like MySQL at all. Or should I look at MongoDB? More generally, how do I think about this problem? I tried looking for some blogs/tutorials online but couldn't find much that was useful.
The first step you should do before thinking about technical design, including a relational or non-SQL platform, is a data model that clearly describes the structure and relations between your data in a platform independent way. I see the following interesting points to solve there:
How is a visualisation related to the data objects it visualizes? When the visualisation just displays the data of one object type (let's say the number of sales per month), this is trivial. But if it covers more than one object type (the number of sales per month, product category, and country), you will have to decide to which of them to link it. There is no single correct solution for this, but it depends on the requirements from the users' view: From which origins will they come to find this visualisation? If they always come from the same origin (let's say the country), it will be enough to link the visuals to that object type.
How will you handle insertions, deletes, and updates of the basic data since the point in time the visualisation has been generated? If no such operations relevant to the visuals are possible, then it's easy: Just store the selection criteria (country = "Austria", product category = "Toys") with the visual, and everyone will know its meaning. If, however, the basic data can be changed, you should implement a data model that covers historizing those data, i.e. being able to reconstruct the data values on which the original visual was based. Of course, before deciding on this, you need to clarify the requirements: Will, in case of changed basic data, the original visual still be of interest or will it need to be re-generated to reflect the changes?
Both questions are neither simplified nor complicated by using a NOSQL database.
No matter what the outcome of those requirements and data modeling efforts are, I would stick to the following principles:
Separate the visuals from the basic data, even if a visual is closely related to just one set of basic data. Reason: The visuals are just a consequence of the basic data that can be re-calculated in case they get lost. So the requirements e.g. for data backup will be more strict for the basic data than for the visuals.
Don't store basic data redundantly to show the basis for each single visual. A timestamp logic with each record of basic data, together with the timestamp of the generated visual will serve the same purpose with less effort and storage volume.

Should I use MongoDB for website email marketing?

I have just researched Mongo DB & see that it has many advantages. I’m working in a web project providing email marketing service. In the previous email marketing script I used (ActiveCampaign), when database MySql got 3Gb, my website was sluggish & loaded slowly. I consider using MongoDB for this new project. Do you think that MongoDB is suitable for a website email marketing? Should I apply for a part or total data?
Example:
Contacts Manager. If using MySql, I have to design the tables that have very complicated relation: contacts, lists, contact_list, additional_info... Querying it is also unpleasant. If using MongoDB, I won’t have to consider much about designing database, query (create, select...) is more simple, especially when I’m buildingmy website with REST API structure.
Website will have many log archive: email sending, open, click...
Contact = [
{email: "name#company.com",
first_name: "Well",
last_name: "E",
list_id: {1, 2, 5},
additional: {phone: "", address: "", im: ""}
}
]
Here is some pointers regarding when to choose Mongodb..
It is necessary to think of Mongodb when your existing database solution/design :
Needs coding around database performance issues – for example adding lots of caching.
Stores data in flat files.
is batch processing yet you need real-time.
Data is complex to model in a relational db.
Application has simple requirements for transactions. MongoDB models data as documents, and single document updates are atomic and durable in MongoDB, unlike many other NoSQL products.
You have a workload that entails high volume ‘matching’ of records such as trade clearing, transaction reconciliation, fraud detection or system/software security applications.
Several types of analytical workloads where one or more of the following are true:
a. the analytics are real-time
b. the data is very complicated to model in a relational schema
c. the data volume is huge
d. the source data is already in a mongo database
I hope you could co-relate this set of ideas with your scenarios.

which database suits my application mysql or mongodb ? using Node.js , Backbone , Now.js

I want to make an application like docs.google.com (without its api,completely on my own server) using
frontend : backbone
backend : node
What database would u think is better ? mysql or mongodb ? Should support good scalability .
I am familiar with mysql with php and i will be happy if the answer is mysql.
But many tutorials i saw, they used mongodb, why did they use mongodb without mysql ?
What should i use ?
Can anyone give me link for some sample application(with source) build using backbone , Node , mysql (or mongo) . or atleast app. with Node and mysql
Thanks
With MongoDB, you can just store JSON objects and retrieve them fully-formed, so you don't really need an ORM layer and you spend less CPU time translating your data back-and-forth. The developers behind MongoDB have also made horizontally scaling the database a higher priority and let you run arbitrary Javascript code to pre-process data on the DB side (allowing map-reduce style filtering of data).
But you lose some for these gains: You can't join records. Actually, the JSON structure you store could only be done via joins in SQL, but in MongoDB you only have that one structure to your data, while in SQL you can query differently and get your data represented in alternate ways much easier, so if you need to do a lot of analytics on your database, MongoDB will make that harder.
The query language in MongoDB is "rougher", in my opinion, than SQL's, partly because it's less familiar, and partly because the querying features "feel" haphazardly put together, partially to make it valid JSON, and partially because there are literally a couple of ways of doing the same thing, and some are older ways that aren't as useful or regularly-formatted as the others. And there's the added complexity of the array and sub-object types over SQL's simple row-based design, so the syntax has to be able to handle querying for arrays that contain some of the values you defined, contain all of the values you defined, contain only the values you defined, and contain none of the values you defined. The same distinctions apply to object keys and their values, and this makes the query syntax harder to grasp. (And while I can see the need for edge-cases, the $where query parameter, which takes a javascript function that is run on every record of the data and returns a boolean, is a Siren song because you can easily define what objects you want to return or not, but it has to run on every record in the database, no indexes can be used.)
So, it depends on what you want to do, but since you say it's for a Google Docs clone, you probably don't care about any representation but the document representation, itself, and you're probably only going to query based on document ID, document name, or the owner's ID/name, nothing too complex in the querying.
Then, I'd say being able to take the JSON representation of the document your user is editing, and just throw it into the database and have it automatically index these important fields, is worth the price of learning a new database.
I was also struggling with this choice looking at the hype created by using MongoDB for tasks it was not built for. So my 2 cents are:
Storing and retrieving hierarchical objects, that your documents probably are, is easier in MongoDB, as David says. It becomes more complicated if you want to store documents that are bigger than 16Mb though - MongoDB's answer is GridFS.
Organising documents in folders, groups, keeping track of which user owns which documents and who he/she provided access to them is definitely easier with MySQL - you have the advantage of powerful SQL queries with joins etc., built in EXPLAIN optimization, triggers, functions, stored procedures, etc. MongoDB is nowhere near.
So what prevents you from using both MySQL to organize the documents and MongoDB to store one collection of documents identified by id (or several collections - one for each document type)? It seems to me the best choice and using two databases in one application is not a problem, really.
MySQL will store users, groups, folders, permissions - whatever you fancy - and for each document it will store a reference to the collection and the document id (MongoDB has a special format for it - DBRefs). MongoDB will store documents themselves in collections, if they are all less than 16MB, or the previews and metadata of documents in collections and the whole documents in GridFS.
David provided a good answer. A few things to add to it.
MongoDB's flexible nature permits for easy agile / iterative development.
MongoDB like node.js is asyncronous in nature and works very well within asyncronous environments.
Mongoose is a good ODM (object document mapper) that makes working with MongoDB with Node.js feel very natural. Unlike ORMs this is a very thin layer.
For Google Doc like functionality, the flexibility & very rich data structure provided by MongoDB feels like a much better fit.
You can find some good example posts by searching for mongoose, node and MongoDB.
Here's one that also uses backbone.js and looks good http://mattkopala.com/blog/2012/02/12/getting-started-with-nodejs/

Does this schema sound better suited for a document-oriented data store or relational?

Disclaimer: let me know if this question is better suited for serverfault.com
I want to store information on music, specifically:
genres
artists
albums
songs
This information will be used in a web application, and I want people to be able to see all of the songs associated to an album, and albums associated to an artist, and artists associated to a genre.
I'm currently using MySQL, but before I make a decision to switch I want to know:
How easy is scaling horizontally?
Is it easier to manage than an SQL based solution?
Would the above data I want to store be too hard to do schema-free?
When I think association, I immediately think RDBMSs; can data be stored in something like CouchDB but still have some kind of association as stated above?
My web application requires replication, how well does CouchDB or others handle this?
Your data seems ideal for document oriented databases.
Document example:
{
"type":"Album",
"artist":"ArtistName",
"album_name":"AlbumName",
"songs" : [
{"title":"SongTitle","duration":4.5}
],
"genres":["rock","indie"]
}
And replication is one of couchDB coolest features ( http://blog.couch.io/post/468392274/whats-new-in-apache-couchdb-0-11-part-three-new )
You might also wanna take a look at Riak.
This kind of information is ideally suited to document databases. As with much real-world data, it is not inherently relational, so shoe-horning it into a relational schema will bring headaches down the line (even using an ORM - I speak from experience). Ubuntu already uses CouchDB for storing music metadata, as well as other things, in their One product.
Taking the remainder of your questions one-by-one:
Horizontal scaling is WAY easier than with RDBMS. This is one of the many reasons big sites like Facebook, Digg and LinkedIn are using, or are actively investigating, schema-less databases. For example, sharding (dividing your data across different nodes in a system) works beautifully thanks to a concept called Eventual Consistency; i.e., the data may be inconsistent across nodes for a while, but it will eventually resolve to a consistent state.
It depends what you mean by "manage"... Installation is generally quick and easy to complete. There are no user accounts to configure and secure (this is instead generally done in the application's business logic layer). Working with a document DB in real time can be interesting: there's no ad hoc querying in CouchDB, for example; you have to use the Futon UI or communicate with it via HTTP requests. MongoDB, however, does support ad hoc querying.
I shouldn't think so. Bastien's answer provides a good example of a JSON document serialising some data. The beauty of schemaless DBs is that fields can be missing from one document and present in another, or the documents can be completely different from one another. This removes many of the problems involved with RDBMS' null value, which are many and varied.
Yes; the associations are stored as nested documents, which are parsed in your application as object references, collections, etc. In Bastien's answer, the "songs" key identifies an array of song documents.
This is very similar to your first question about horizontal scaling (horizontal scaling and replication are intertwined). As the CouchIO blog post Bastien mentioned states, "Replication … has been baked into CouchDB from the beginning.". My understanding is that all document databases handle replication well, and do so more easily than it is to set it up in an RDBMS.
Were you to decide you wanted to store the song file itself along with the metadata, you could do that too in CouchDB, by supplying the song file as an attachment to the document; further more, you wouldn't have any schema inconsistencies as a result of doing this, because there is no schema!
I hope I haven't made too many missteps here; I'm quite new to document DBs myself.