Storing serialized ruby object in database - mysql

I would like to store very large sets of serialized Ruby objects in db (mysql).
1) What are the cons and pros?
2) Is there any alternative way?
3) What are technical difficulties if the objects are really big?
4) Will I face memory issues while serializing and de-serializing if the objects are really big ?

Pros
Allows you to store arbitrary complex objects
Simplified your db schema (no need to represent those complex objects)
Cons
Complicates your models and data layer
Potentially need to handle multiple versions of serialized objects (changes to object definition over time)
Inability to directly query serialized columns
Alternatives
As the previous answer stated, an object database or document oriented database may meet your requirements.
Difficulties
If your objects are quite large you may run into difficulties when moving data between your DBMS and your program. You could minimize this by separating the storage of the object data and the meta data related to the object.
Memory Issues
Running out of memory is definitely a possibility with large enough objects. It also depends on the type of serialization you use. To know how much memory you'd be using, you'd need to profile your app. I'd suggest ruby-prof, bleak_house or memprof.
I'd suggest using a non-binary serialization wherever possible. You don't have to use only one type of serialization for your entire database, but that could get complex and messy.
If this is how you want to proceed, using an object oriented dbms like ObjectStore or a document oriented dbms like CouchDB would probably be your best option. They're better designed and targeted for object serialization.

As an alternative you could use any of the multitude of NoSQL databases. If you can serialize your object to JSON then it should be easily stored in CouchDB.

You have to bear in mind that the serialized objects in terms of disk space are far larger than if you saved them in your own way, and loaded them in your own way. I/O from the hard drive is very slow and if you're looking at complex objects, that take a lot of processing power, it may actually be faster to load the file(s) and process it on each startup; or perhaps saving the data in such a way that's easy to load.

Related

Storage: database vs in-memory objects vs in-memory database

I'm doing a project where I have to store data for a NodeJS express server. It's not a LOT of data, but i have to save it somewhere.
I always hear that a database is good for that kinda stuff, but I thought about just saving all the data in objects in NodeJS and back them up as JSON to disk every minute (or 5 minutes). Would that be a good idea?
What im thinking here is that the response time from objects like that are way faster than from a database, and saving them is easy. But then I heared that there are in-memory databases aswell, so my question is:
Are in-memory databases faster than javascript objects? Are JSON-based data backups a good idea in this aspect? Or should I simply go with a normal database because the performance doesn't really matter in this case?
Thanks!
If this is nothing but a school assignment or toy project with very simple models and access patterns, then sure rolling your own data persistence might make sense.
However, I'd advocate for using a database if:
you have a lot of objects or different types of objects
you need to query or filter objects by various criteria
you need more reliable data persistence
you need multiple services to access the same data
you need access controls
you need any other database feature
Since you ask about speed, for trivial stuff, in-memory objects will likely be faster to access. But, for more complicated stuff (lots of data, object relations, pagination, etc.), a database could start being faster.
You mention in-memory databases but those would only be used if you want the database features without the persistence and would be closer to your in-memory objects but without the file writing. So it just depends on if you care about keeping the data or not.
Also if you haven't ever worked with any kind of database, now's a perfect time to learn :).
What I'm thinking here is that the response time from objects like that is way faster than from a database, and saving them is easy.
That's not true. Databases are the persistence storage, there will always be I/O latency. I would recommend using Mysql for sql database and MongoDB or Cassandra for nosql.
An in-memory database is definitely faster but again you need persistence storage for those data. redis is a very popular in-memory database.
MongoDB store data in BSON (a superset of JSON) like formate, so it will be a good choice in your case.

What are the drawbacks to a JSON server-side data storage file?

I'm looking for alternative data storage methods to SQL (That is to say, I do not want to use SQL, even for queries) and came across a few based on JSON. Talking with friends who do database work, they said I shouldn't consider these, but wouldn't elaborate. What are the potential (and practical) drawbacks to using JSON as a data storage file format?
I figured JSON would be better than SQL for these reasons:
JSON is strictly defined and doesn't have flavors (Oracle, Microsoft, MySQL, etc.)
Since Google started making Chrome, JS interpreters have made reading, parsing, and outputting JS (and thus JSON) a very fast and easy process.
Database output could be pure JSON, erasing the need for a middle-man interpreter for browsers, etc.
among others...
I think you might want to take a look at NO-SQL databases:
https://en.wikipedia.org/wiki/NoSQL
If you like using JSON-like data, then one I have personally used is MongoDB.
I have not used it as a main/single source of my app data, but only for secondary purposes. But, I guess, you can try using it as your main data storage too (I think many people do).
What I have tried, and was quite satisfying, was MongoDB with C# and using MongoVue as a GUI application for executing queries and interaction with the DB. I was not very happy with MongoVUE, but it seems that it was the best option at the time.
However, SQL DBs are very good at defining relationships in your data. E.g. referencing an entry on table A from an entry on table B, and that kind of stuff. Using those relationships, you can join tables and do many interesting things. I think, it is good for you to get some experience on this field as well.
MongoDB is not build for defining relationships (as far as I understand). It has the concept of "documents", where you store information in a JSON like format (with nested key/values). You can query documents, but joining seems like hacking your way around its normal usage: How do I perform the SQL Join equivalent in MongoDB?
Also, ensuring data consistency (in a truly reliable manner) when using relationships in MongoDB seems pretty impossible to me. But even if I am wrong and it is possible, it will be 10 times harder achieving it than with SQL DBs.
But you can have a look at the list in WikiPedia and there might be a better alternative than MongoDB for you.
But you can use pure JSON as well with no DB system.
So, in summary JSON-like storage has (at least) these issues:
Not good at defining and utilizing relationships
When using relationships, data integrity (or more likely, reference integrity) is hard.
If you are not using a good DB system, but you just dump JSON into a file, when that file becomes too big you will have performance issues. Imagine querying a 1GB JSON encoded array of objects to get the ones you want. You will have to load the entire array on memory, run through the whole of it (since you will have no indices) and then (if you have not run out of memory and your connection -when using a network- has not expired) you will get a result. Most NO-SQL DBs like MongoDB and most SQL DBs have no such problems (at least within reasonable amounts of data). They are fine-tuned, they support indexing, references, permissions, roles and you can also define executing code at the DB level (like triggers and stored procedures). Certainly they are more complex, but that complexity may be required most of the times to achieve the end result.
JSON, or JavaScript Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. It is used primarily to transmit data between a server and web application, as an alternative to XML.
You are more looking at the comparison between database vs flat-file storage really.
Even when using a relational DB, data integrity (or referential integrity) is still hard because rows are, usually, timestamped. Quite often foreign keys are not enforced because of this. When an row update occurs you have 2 choices. Firstly, 'forget' the previous version. Secondly, update the original row and copy the previous version into a timestamped 'non-relational' history table where foreign keys are useless. Most business data requires updates. Features for maintaining referential integrity in Relational databases are useless for this type of business data (which represents most enterprise data).
What is needed is a Temporal Database, or an abstraction layer which presents a user with the appropriate version of a row based on a Time context. Ideally in 2 dimensions i.e. transaction time and business time (aka valid time).

Comparison of JSON and User-Defined types in Postgres 9.3

I wonder why there is so much fuss about JSON support in Postgres 9.3. What are advantages of JSON over User-defined types (UDTs)? What are pitfalls in using UDTs? Is access to tables with UDTs inefficient? Is ALTER TYPE ADD attribute slow? How are UDTs physically stored by Postgres?
Please, explain and give links to additional information.
I think JSON is much more flexible than User-defined types, you can add whatever optional attributes you want, you can nest them, you can put them into lists;
JSON is very readable format;
JSON is standard object notation in many languages (Javascript, Python) so you can read data from table and use it;
You don't have to create new type anytime when you want to process data, you can create JSON, process it, then just forget about it;
As Roman Pekar mentioned in one of the previous answers, JSON support offer much more flexibility and it offers the possibility to kinda mimic a NoSQL behavior on a relational one.
Furthermore, it makes it easier in Client-Server applications to store JSON values sent from the client directly from the Database.
One can use 30% of the fields for a client of the application, 30% for another, and so on, not having to define multiple tables or tables with a large set of columns. Thus, one can store large chunks of heterogeneous information into one place.
Last but not least, JSON is a standard and it's supported by many of the big programming languages.
(We are currently using the feature in our project (and have been using it since in Beta); furthermore, this was the main reason we chose Postgres for our application, as we needed a big DB with mainly decoupled information. We tried using NoSQL databases but we needed too many tables to store the information in, and it was costly on "joins". On the other hand, it would have been hard to cope only with a relational DB, so instead of going half-relational half-nonrelational, we chose Postgres's JSON support.)
Less seriously it is a marketing little bit :). More seriously - internal JSON support is final stage of some years work on this topic. There are no large difference between internal types and UDT and lot of internal types (and functionality) starts as UDT or UDF. A move to upstream is relative hard process and concept (and API) is hardly tested and discussed. So internal implementation guarantees significantly higher quality and higher stability (less errors, more stable API) and support. A community says - "it is interesting feature for us, and we would to enhance and support". There are no other differences - (in performance or storage format).
Some things about user defined types:
They are fixed format (JSON is schemaless)
You can make assumptions about them.
You have more flexibility in addressing indexing concerns.
A composite type (one form of UDT) can have a JSON attribute.
User defined types allow you to extend SQL quite a bit more reliably around these types than JSON does, but JSON gives you a lot more flexibility. Also, nested types are a lot more flexible in terms of support when done as composite types than as JSON objects in 9.3 (though this may change at some point). You cannot convert, in 9.3, a JSON object to a composite type if the JSON object is nested at all.

Store JSON data as Text in MySQL

This is more of a concept/database architecture related question. In order to maintain data consistency, instead of a NoSQL data store, I'm just storing JSON objects as strings/Text in MySQL. So a MySQL row will look like this
ID, TIME_STAMP, DATA
I'll store JSON data in the DATA field. I won't be updating any rows, instead I'll add new rows with the current time stamp. So, when I want the latest data I just fetch the row with the max(timestamp). I'm using Tornado with the Python MySQLDB driver as my primary backend application.
I find this approach very straight forward and less prone to errors. The JSON objects are fairly simple and are not nested heavily.
Is this approach fundamentally wrong ? Are there any issues with storing JSON data as Text in MySQL or should I use a file system based storage such as HDFS. Please let me know.
MySQL, as you probably know, is a relational database manager. It is designed for being used in a way where data is related to each other through keys, forming relations which can then be used to yield complex retrieval of data. Your method will technically work (and be quite fast), but will probably (based on what I've seen so far) considerably impair your possibility of leveraging the technology you're using, should you expand the complexity of your scope!
I would recommend you use a database like Redis or MongoDB as they are designed for document storage rather than relational architectures.
That said, if you find the approach works fine for what you're building, just go ahead. You might face some blockers up ahead if you want to add complexity to your solution but either way, you'll learn something new! Good luck!
Pradeeb, to help answer your question you need to analyze your use case. What kind of data are you storing? For me, this would be the deciding factor: every technology has its specific use case where it excels at.
I think it is safe to assume that you use JSON since your data structure needs to very flexible documents, compared to a traditional relational DB. There are certain data stores that natively support such data structures, such as MongoDB (they call it "binary JSON" or BSON) as Phil pointed out. This would give you improved storage and/or improved search capabilities. Again, the utility depends entirely on your use case.
If you are looking for something like a job queue and horizontal scalability is not an issue and you just need fast access of the latest you could use RedisDB, an in-memory key value store, that has a hash (associative array) data type and lists for this kind of thing. Alternatively, since you mentioned HDFS and horizontal scalability may very well be an issue, I can recommend using queue systems like Apache ActiveMQ or RabbitMQ.
Lastly, if you are writing heavily, and your are not client limited but your data storage is your bottle neck: look into distributed, flexible-schema data storage like HBase or Cassandra. They offer flexible data schemas, are heavily write optimized, and data can be appended and remains in chronological order, so you can fetch the newest data efficiently.
Hope that helps.
This is not a problem. You can also use memcached storage engine in modern MySQL which would be perfect. Although I have never tried that.
Another approach is to use memcached as cache. Write everything to both memcached, and also mysql. When you go to read data, try reading from memcached. If it does not exist, read from mysql. This is a common technique to reduce database bottleneck.

which database suits my application mysql or mongodb ? using Node.js , Backbone , Now.js

I want to make an application like docs.google.com (without its api,completely on my own server) using
frontend : backbone
backend : node
What database would u think is better ? mysql or mongodb ? Should support good scalability .
I am familiar with mysql with php and i will be happy if the answer is mysql.
But many tutorials i saw, they used mongodb, why did they use mongodb without mysql ?
What should i use ?
Can anyone give me link for some sample application(with source) build using backbone , Node , mysql (or mongo) . or atleast app. with Node and mysql
Thanks
With MongoDB, you can just store JSON objects and retrieve them fully-formed, so you don't really need an ORM layer and you spend less CPU time translating your data back-and-forth. The developers behind MongoDB have also made horizontally scaling the database a higher priority and let you run arbitrary Javascript code to pre-process data on the DB side (allowing map-reduce style filtering of data).
But you lose some for these gains: You can't join records. Actually, the JSON structure you store could only be done via joins in SQL, but in MongoDB you only have that one structure to your data, while in SQL you can query differently and get your data represented in alternate ways much easier, so if you need to do a lot of analytics on your database, MongoDB will make that harder.
The query language in MongoDB is "rougher", in my opinion, than SQL's, partly because it's less familiar, and partly because the querying features "feel" haphazardly put together, partially to make it valid JSON, and partially because there are literally a couple of ways of doing the same thing, and some are older ways that aren't as useful or regularly-formatted as the others. And there's the added complexity of the array and sub-object types over SQL's simple row-based design, so the syntax has to be able to handle querying for arrays that contain some of the values you defined, contain all of the values you defined, contain only the values you defined, and contain none of the values you defined. The same distinctions apply to object keys and their values, and this makes the query syntax harder to grasp. (And while I can see the need for edge-cases, the $where query parameter, which takes a javascript function that is run on every record of the data and returns a boolean, is a Siren song because you can easily define what objects you want to return or not, but it has to run on every record in the database, no indexes can be used.)
So, it depends on what you want to do, but since you say it's for a Google Docs clone, you probably don't care about any representation but the document representation, itself, and you're probably only going to query based on document ID, document name, or the owner's ID/name, nothing too complex in the querying.
Then, I'd say being able to take the JSON representation of the document your user is editing, and just throw it into the database and have it automatically index these important fields, is worth the price of learning a new database.
I was also struggling with this choice looking at the hype created by using MongoDB for tasks it was not built for. So my 2 cents are:
Storing and retrieving hierarchical objects, that your documents probably are, is easier in MongoDB, as David says. It becomes more complicated if you want to store documents that are bigger than 16Mb though - MongoDB's answer is GridFS.
Organising documents in folders, groups, keeping track of which user owns which documents and who he/she provided access to them is definitely easier with MySQL - you have the advantage of powerful SQL queries with joins etc., built in EXPLAIN optimization, triggers, functions, stored procedures, etc. MongoDB is nowhere near.
So what prevents you from using both MySQL to organize the documents and MongoDB to store one collection of documents identified by id (or several collections - one for each document type)? It seems to me the best choice and using two databases in one application is not a problem, really.
MySQL will store users, groups, folders, permissions - whatever you fancy - and for each document it will store a reference to the collection and the document id (MongoDB has a special format for it - DBRefs). MongoDB will store documents themselves in collections, if they are all less than 16MB, or the previews and metadata of documents in collections and the whole documents in GridFS.
David provided a good answer. A few things to add to it.
MongoDB's flexible nature permits for easy agile / iterative development.
MongoDB like node.js is asyncronous in nature and works very well within asyncronous environments.
Mongoose is a good ODM (object document mapper) that makes working with MongoDB with Node.js feel very natural. Unlike ORMs this is a very thin layer.
For Google Doc like functionality, the flexibility & very rich data structure provided by MongoDB feels like a much better fit.
You can find some good example posts by searching for mongoose, node and MongoDB.
Here's one that also uses backbone.js and looks good http://mattkopala.com/blog/2012/02/12/getting-started-with-nodejs/