Risks in using Neo4j as a stand-alone database

Risks in using Neo4j as a stand-alone database - mysql

I have seen quite a few products with graph-related data models built on both Neo4j and a relational or document database. The other db is generally used to store the metadata of each node.
I am considering building a product relying entirely on Neo4j, storing all my objects' metadata as node properties. Is there any caveat in doing so?

Entirely depends of how much metadata you want to store. 10 primitive / short String properties per node is absolutely fine. 1000 large JSON documents per node... not so much. It isn't a document store.
What sort of numbers are we talking about? I would suggest you generate a random graph with similar number of properties and similar values that you wish to have in your product. See how it performs.
Otherwise no caveats I would say. Oh, don't refer to internal Neo4j node IDs anywhere; unlike in a relational database, these get re-used.

Related

Best IoT Database?

I have many IoT devices sending data currently to MySQL Database.
I want to port it to some other Database, which will be Open Source and provide me with:
JSON support
Scalability
Flexibility to add multiple columns automatically as per payload
Python and PHP Support
Extremely Fast Read, Write
Ability to export at least 6 months of data in CSV format
Please revert back soon.
Any help will be appreciated.
Thanks

Shaping your database based on input data is a mistake. Think of tomorrow your data will be CSV or XML, in a slight different format. Design your database based on your abstract data model, normalize it and apply existing data to your model. Shape your structure based on what input you have and what output you plan to get. If you retrieve the same content as the input, storing data in files will be sufficient, you don't need a database.
Also, you don't want to store "raw" records the database. Even if your database can compose a data record out of the raw element at run time, you cannot run a selection based on a certain extracted element, without visiting all the records.
Most of the databases allow you to connect from anywhere (there is not such thing as better support of PostgreSQL in Java as compared to Python, but the quality and level of standardization for drivers may vary). The question is what features shall your driver support. For example, you may require support for bulk import (don't issue large INSERT sets to the database).
What you actually look for is:
scalability: can your database grow with your data? Would the DB benefit of adding additional CPUs (MySQL particularly doesn't for large queries). Can you shard your database on multiple instances? (MySQL again fails to handle that).
does your model looks like a snowflake? If yes, you may consider NoSQL, otherwise stay away of it. If you manage to model as a snowflake (and this means you are open for compromises) you may use anything like Lucene based search products, Mongo, Cassandra, etc. The fact you have timeseries doesn't qualify you for NoSQL. For example, you may have 10K devices issuing 5k message types. Specific data is redundantly recorded at device level and at message type level. In that case, because of the n:m relation, you don't have the snowflake anymore.
why do you store the data? What queries are you going to issue?

Why do you want to move away from MySQL? It is open source and can meet all of the criteria you listed above. This is a very subjective question so it's hard to give a good answer, but MySQL is not a bad option

What are the drawbacks to a JSON server-side data storage file?

I'm looking for alternative data storage methods to SQL (That is to say, I do not want to use SQL, even for queries) and came across a few based on JSON. Talking with friends who do database work, they said I shouldn't consider these, but wouldn't elaborate. What are the potential (and practical) drawbacks to using JSON as a data storage file format?
I figured JSON would be better than SQL for these reasons:
JSON is strictly defined and doesn't have flavors (Oracle, Microsoft, MySQL, etc.)
Since Google started making Chrome, JS interpreters have made reading, parsing, and outputting JS (and thus JSON) a very fast and easy process.
Database output could be pure JSON, erasing the need for a middle-man interpreter for browsers, etc.
among others...

I think you might want to take a look at NO-SQL databases:
https://en.wikipedia.org/wiki/NoSQL
If you like using JSON-like data, then one I have personally used is MongoDB.
I have not used it as a main/single source of my app data, but only for secondary purposes. But, I guess, you can try using it as your main data storage too (I think many people do).
What I have tried, and was quite satisfying, was MongoDB with C# and using MongoVue as a GUI application for executing queries and interaction with the DB. I was not very happy with MongoVUE, but it seems that it was the best option at the time.
However, SQL DBs are very good at defining relationships in your data. E.g. referencing an entry on table A from an entry on table B, and that kind of stuff. Using those relationships, you can join tables and do many interesting things. I think, it is good for you to get some experience on this field as well.
MongoDB is not build for defining relationships (as far as I understand). It has the concept of "documents", where you store information in a JSON like format (with nested key/values). You can query documents, but joining seems like hacking your way around its normal usage: How do I perform the SQL Join equivalent in MongoDB?
Also, ensuring data consistency (in a truly reliable manner) when using relationships in MongoDB seems pretty impossible to me. But even if I am wrong and it is possible, it will be 10 times harder achieving it than with SQL DBs.
But you can have a look at the list in WikiPedia and there might be a better alternative than MongoDB for you.
But you can use pure JSON as well with no DB system.
So, in summary JSON-like storage has (at least) these issues:
Not good at defining and utilizing relationships
When using relationships, data integrity (or more likely, reference integrity) is hard.
If you are not using a good DB system, but you just dump JSON into a file, when that file becomes too big you will have performance issues. Imagine querying a 1GB JSON encoded array of objects to get the ones you want. You will have to load the entire array on memory, run through the whole of it (since you will have no indices) and then (if you have not run out of memory and your connection -when using a network- has not expired) you will get a result. Most NO-SQL DBs like MongoDB and most SQL DBs have no such problems (at least within reasonable amounts of data). They are fine-tuned, they support indexing, references, permissions, roles and you can also define executing code at the DB level (like triggers and stored procedures). Certainly they are more complex, but that complexity may be required most of the times to achieve the end result.

JSON, or JavaScript Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. It is used primarily to transmit data between a server and web application, as an alternative to XML.
You are more looking at the comparison between database vs flat-file storage really.

Even when using a relational DB, data integrity (or referential integrity) is still hard because rows are, usually, timestamped. Quite often foreign keys are not enforced because of this. When an row update occurs you have 2 choices. Firstly, 'forget' the previous version. Secondly, update the original row and copy the previous version into a timestamped 'non-relational' history table where foreign keys are useless. Most business data requires updates. Features for maintaining referential integrity in Relational databases are useless for this type of business data (which represents most enterprise data).
What is needed is a Temporal Database, or an abstraction layer which presents a user with the appropriate version of a row based on a Time context. Ideally in 2 dimensions i.e. transaction time and business time (aka valid time).

Storing massive EXIF and IPTC attributes in database

I'm writing an app that needs to handle more than 15.000 photos and I want to store into the database their EXIF and IPTC attributes.
My initial approach is to use MySQL and create a table to store all the attributes, as it is suggested here.
However most of the photos have up to 250 attributes. Since I got 15k photos that means I will have almost 4 million rows. And this is only the beginning (I expect more photos in the future).
I wonder whether MySQL would be ok in this scenario or I should move to a NoSQL approach like MongoDB.
Please also note that I need to make the database searchable.
Thanks in advance.

If you're a .Net developer, RavenDB is ideal for your scenario. It can easily handle that volume on very modest hardware, and has outstanding search capabilities provided by it's internal use of the Lucene search engine.
The photos themselves would be stored as attachments, while the attributes would be part of the document.
Even if you're not a .Net developer, RavenDB can be used over http/rest from any language. It's just much easier with the native .Net client.

which database suits my application mysql or mongodb ? using Node.js , Backbone , Now.js

I want to make an application like docs.google.com (without its api,completely on my own server) using
frontend : backbone
backend : node
What database would u think is better ? mysql or mongodb ? Should support good scalability .
I am familiar with mysql with php and i will be happy if the answer is mysql.
But many tutorials i saw, they used mongodb, why did they use mongodb without mysql ?
What should i use ?
Can anyone give me link for some sample application(with source) build using backbone , Node , mysql (or mongo) . or atleast app. with Node and mysql
Thanks

With MongoDB, you can just store JSON objects and retrieve them fully-formed, so you don't really need an ORM layer and you spend less CPU time translating your data back-and-forth. The developers behind MongoDB have also made horizontally scaling the database a higher priority and let you run arbitrary Javascript code to pre-process data on the DB side (allowing map-reduce style filtering of data).
But you lose some for these gains: You can't join records. Actually, the JSON structure you store could only be done via joins in SQL, but in MongoDB you only have that one structure to your data, while in SQL you can query differently and get your data represented in alternate ways much easier, so if you need to do a lot of analytics on your database, MongoDB will make that harder.
The query language in MongoDB is "rougher", in my opinion, than SQL's, partly because it's less familiar, and partly because the querying features "feel" haphazardly put together, partially to make it valid JSON, and partially because there are literally a couple of ways of doing the same thing, and some are older ways that aren't as useful or regularly-formatted as the others. And there's the added complexity of the array and sub-object types over SQL's simple row-based design, so the syntax has to be able to handle querying for arrays that contain some of the values you defined, contain all of the values you defined, contain only the values you defined, and contain none of the values you defined. The same distinctions apply to object keys and their values, and this makes the query syntax harder to grasp. (And while I can see the need for edge-cases, the $where query parameter, which takes a javascript function that is run on every record of the data and returns a boolean, is a Siren song because you can easily define what objects you want to return or not, but it has to run on every record in the database, no indexes can be used.)
So, it depends on what you want to do, but since you say it's for a Google Docs clone, you probably don't care about any representation but the document representation, itself, and you're probably only going to query based on document ID, document name, or the owner's ID/name, nothing too complex in the querying.
Then, I'd say being able to take the JSON representation of the document your user is editing, and just throw it into the database and have it automatically index these important fields, is worth the price of learning a new database.

I was also struggling with this choice looking at the hype created by using MongoDB for tasks it was not built for. So my 2 cents are:
Storing and retrieving hierarchical objects, that your documents probably are, is easier in MongoDB, as David says. It becomes more complicated if you want to store documents that are bigger than 16Mb though - MongoDB's answer is GridFS.
Organising documents in folders, groups, keeping track of which user owns which documents and who he/she provided access to them is definitely easier with MySQL - you have the advantage of powerful SQL queries with joins etc., built in EXPLAIN optimization, triggers, functions, stored procedures, etc. MongoDB is nowhere near.
So what prevents you from using both MySQL to organize the documents and MongoDB to store one collection of documents identified by id (or several collections - one for each document type)? It seems to me the best choice and using two databases in one application is not a problem, really.
MySQL will store users, groups, folders, permissions - whatever you fancy - and for each document it will store a reference to the collection and the document id (MongoDB has a special format for it - DBRefs). MongoDB will store documents themselves in collections, if they are all less than 16MB, or the previews and metadata of documents in collections and the whole documents in GridFS.

David provided a good answer. A few things to add to it.
MongoDB's flexible nature permits for easy agile / iterative development.
MongoDB like node.js is asyncronous in nature and works very well within asyncronous environments.
Mongoose is a good ODM (object document mapper) that makes working with MongoDB with Node.js feel very natural. Unlike ORMs this is a very thin layer.
For Google Doc like functionality, the flexibility & very rich data structure provided by MongoDB feels like a much better fit.
You can find some good example posts by searching for mongoose, node and MongoDB.
Here's one that also uses backbone.js and looks good http://mattkopala.com/blog/2012/02/12/getting-started-with-nodejs/

What are some cons of storing html in a database for use?

Altough its very easy to do a search about the topic, it's not as easy to come to a conclusion. What are some cons of storing html in a database for use?

HTML is static, and querying the data from a database uses database resources; database resources are typically among the more restricted on moderate to heavy use systems, therefore it makes sense to not store HTML in the database, but to place it on the filesystem, where it can be retrieved without using critical resources.

In the broadest sense, HTML is a document markup language and serves to structure data into a document. The database on the other hand should contain raw data organized along its logical relations. Documents use formatting and may present data redundantly, but the true, underlying data is always fixed. Thus you should store the most immediate, raw form of data that you possibly can, and retrieve it in meaningful ways using both the query language itself to create suitable views for your purposes, and other, output-specific data processing to generate documents.
Of course you may like to cache the result of an output formatting operation, and you may choose to store the cache in a database, too. That's fine of course. But concerning the raw payload data, I would always go for the above.

That depends on the use of the HTML in the database. If it's data that you only ever access as a blob (meaning you never/rarely query the contents of the HTML), then I think it can be a good idea in some cases. Then the question is essentially the same as "Should I store files in xyz format in my database?" And the answer to questions like that depends on several things:
How large are the files? Would storing them on the filesystem, with just their filename/path in the DB be more efficient?
Do you need to replicate the data to other servers? If so, then storing raw files in the DB may be easier than on the FS, if you already have DB-sync infrastructure in place.
What are your query uses like? Are they friendlier to a DB or a file system storage?
Now, if you're talking about storing HTML data that you frequently have to query, that changes the game entirely.
Any database normalization nazi would tell you never to do it. But there might be cases when it's useful. For instance, if you're using some sort of full-text searching engine, you may want that in a database--or in whatever form the full-text search engine uses.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008