I have a very large Json object that i want to put in a nosql database.
I would like to know:
first, how to generate the database schema based on that Json object?
second, is there a way to put this object automatically in the database, without manually specifying which value (in json object) goes in which column (in the database)?
I hope I was clear enough. Thanks!
Since you haven't specified which NoSQL database you're using in particular, for convenience, I'll assume you're using MongoDB when I talk about things that are implementation specific.
First off, you should know that NoSQL databases by nature are "schema-less". You still could implement your own schema (in your app, not the db), but that's optional, and mostly done just for validation purposes or to let future developers understand the planned structure of your data better. Read the Dynamic Schemas section in this article to know more. Here is a SO answer explaining how you would do that in mongoose and here is the official guide/doc for it.
Second, NoSQL databases don't work in terms of columns or rows. Rather, you need to think in terms of collections and documents. So to answer your question : Yes, when you have a JSON object, you shove it in directly (before applying any required formatting if you've implemented a schema like in above). You don't enter data value by value (unless you've intentionally set it up to do so).
It sounds to me that you need to strengthen your fundamental understanding of how NoSQL works as you seem to be confusing yourself with concepts that belong to other DBMS. Here is a neat slideshow to get you started and the previous article I linked you to also gives you a decent introduction.
After you're done, consider installing MongoDB or something similar and just playing around with the command line interface to get a good hang of it.
Related
So this is more of a conceptual question. There might be some fundamental concepts which I don't understand clearly so please point out any mistakes in my understanding.
I am tasked with designing a framework and a part of it is I have a MySQL DB and a REST API which acts as the Data Access Layer. Now, the user should be able to parse various data (JSON, CSV, XML, Text, Source Code etc.) and send it to the REST API which persists the data to the DB.
Question 1: Should I specify that all data sent to the REST API should be in JSON format no matter what is parsed? This will ensure (best to my understanding) language independence and gives the REST API a common format to deal with.
Question 2: When it comes to a data model, what should I specify? Is it like a one-model-fits-all sort of thing or is the data model subject to change based on the incoming data?
Question 3: When I think of a relational data model, the thought of foreign keys comes to mind which creates the relation. Now, it might happen that some data may not contain any relation at all. If we think of something like Customer Order sort of data then the relation is easy to identify. But what if the data does not have any relation at all? How does the relational model fit into this?
Any help/suggestion is greatly appreciated. Thank you!
EDIT:
First off, the data can be both structured (say XML) and unstructured (say two text files). I want the DAL to be able to handle and persist whatever data that comes in (that's why I thought of a REST interface in front of the DB).
Secondly, I also just recently thought about MongoDB as an option and was looking into it (I have never used NoSQL DBs before). It kind of makes sense to use it if the incoming data in REST is in JSON. From what I understood I can create a collection in Mongo. Does that make more sense than using a Relational DB??
Finally, as to what I want to do with the data is I have a tool which performs a sort of difference analysis (think git diff) on the data. Say I sent two XML files and the tool retrieves it from the DB and performs the difference analysis and stores the result back in the DB.
Based on these requirements, what would be the optimum way to go about it?
The answer to this will depend on what sort of data it is. Are all these different data types using different notation for the same data? If so then storing in normalised database tables is the way to go. If its just arbitrary strings that happen to have some form of encoding, then its probably best to store in raw.
Again, it depends on what you want to do with it afterwards. Are you analysing the data, and you reporting on it? Are you reading one format and converting to another? Is it all some form of key-value pairs in some notation or other
No way to answer this further without understanding what you are trying to achieve.
I want to build an application that uses data from several endpoints.
Lets say I have:
JSON API for getting cinema data
XML Export for getting data about ???
Another JSON API for something else
A csv-file for some more shit ...
In my application I want to bring all this data together and build views for it and so on ...
MY idea was to set up a database by create schemas for all these data sources, so I can do some kind of "import scripts" which I can call whenever I want to get the latest data.
I thought of schemas because I want to be able to easily adept a new API with any kind of schema.
Please enlighten me of the possibilities and best practices out there (theory and practice if possible :P)
You are totally right on making a database. But the real problem is probably not going to be how to store your data. It's going to be how to make it fit together logically and semantically.
I suggest you first take a good look at what your enpoints can provide. Get several samples from every source and analyze them if you can. How will you know which data is new? How can you match it against existing data and against data from other sources? If existing data changes or gets deleted, how will you detect and handle that? What if sources disagree on something? How and when should you run the synchronization? What will you do if one of your sources goes down? Etc.
It is extremely difficult to make data consistent if your data sources are not. As a rule, if the sources are different, they are not consistent. Thus the proverb "garbage in, garbage out". We, humans, have no problem dealing with small inconsistencies, but algorithms cannot work correctly if there are discrepancies. Even if everything fits together on paper, one usually forgets that data can change over time...
At least that's my experience in such cases.
I'm not sure if in the application you want to display all the data in the same view or if you are going to be creating different views for each of the sources. If you want to display the data in the same view, like a grid, I would recommend using inheritance or an interface depending on your data and needs. I would recommend setting this structure up in the database too using different tables for the different sources and having a parent table related to all them that has a type associated with it.
Here's a good thread with discussion about choosing an interface or inheritance.
Inheritance vs. interface in C#
And here are some examples of representing inheritance in a database.
How can you represent inheritance in a database?
So I'm thinking of using MongoDB for a project. But I've read about issues that is has with relational type of data. http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
However, I still like that it stores json and can have dynamic fields within collections. I like that it resembles my json in my app so I can easily bind to it using libraries like Angular. I use Node.js. I'm digging this whole MEAN (mongodb, express, angular, node) stack concept.
What I don't like though are these relational issues, it seems that it doesn't deal too well with real world data. Think a relational app: school/students/teachers with all sorts of relationships in them.
That said, I don't like sql. I think it's outdated especially for json/javascript based webapps and I don't like to define types, I don't like that it's hard to dynamically add fields to a table and I don't like to manually do joins.
So my question is.. is there something in the middle? Somewhere in the middle between a mysql database and a mongodb database? Maybe a normalized json store that handles the relationships for me. Or a mongo-like db that is really fast with references (as opposed to putting everything in the same document).
p.s I know there are such things as MySQL ORMs. but I want the actual database to store json, kind of like mongodb does, but just to be able to handle relational data as well.
Have a look at ElasticSearch. It's much more powerful than MongoDB, scales better, and it supports nested documents and joins, which are not really the same as in a relational database, but -like you say- it's somewhere inbetween.
http://www.elasticsearch.com/
This is kind of implementation question maybe. I wonder if I where to make a tool to convert some relational database to some other kind of database. What would the approach be?
If I for example want to convert data and the structure from a mysql database to mssql. Would I need to use regular expression to parse the SQL-file? Or maybe I could convert it to XML or JSON first and from that structure parse into my targeted database?
Using existing tools for converting mysql to mssql or anything similar is not in this scope. Since I want to know how it is actually done.
Well it's kind of a broad question, but generally speaking, having your own abstract representation of the structure and data would be a good thing, because you could extend your system "easily" by writing importers and exporters, and actually decouple your code a little by abstracting the relational db concepts into your own format.
The importers would "reverse engineer" a given database, by converting it to your own representation (as you say, xml/json or even your own query language -that would be better I guess-). Then the exporters would just convert from your format to the requested SQL dialect. No regular expressions, no other stuff "hardcoded".
This will allow you to extend your system and support a bigger number of sources and targets, and also handle errors like some SQL features from a "source" not supported in the selected "target".
My 2 cents, hope it helps!
I have an existing Schema definition in MySql database. I created the schema using MySql Workbench.
I wish to access the schema from my Lift-Scala-Squeryl code. I know that a simple way would be to manually define the schema structure using Squeryl data objects.
Is there an automated way to generate Squeryl data objects out of existing MySql schema?
I've found the following general question, but I'm sure there can be a way to generate a naive structure, although not accurate, it will allow a better starting point for the manual work.
Thanks, David.
Max, Squeryl's creator, had suggested that this would be a good idea a while back. Here is the google group discussion.
You may not be too pleased with me for this, but I think I talked him out of it :) So, to my knowledge, there isn't a way to do it. Besides the issues I pointed out in that thread, the fact that Squeryl can work in multiple modes (primitive types, custom types, lift record types) it would be a difficult thing to do and get right for everyone.