MongoDB, Mysql and relationships - mysql

I'm creating an online chat.
Context (if needed):
So far I was using PHP/MySQL and AJAX to do the job but this is not a healthy solution as I'm stuck with a "pull" type application with concerns about scalability.
I read about the "push" method alternatives and it seems that my choices are limited and exclude PHP.
Websockets could be a very interesting option if it was integrated in every browser but that's not the case (and it seems that for most of those implementing it, it is disabled by default).
Long polling would also be a candidate but it involves other issues like the number of concurrent open connections that may kill your web app too.
This is why, against my will, I think that my only viable option is to use server-side javascript (node.js + now.js would be my choice then).
This said, I may need to rethink the use of a database too.
I need to keep stored data of each users and link these users to their submitted messages.
In case of a chat engine driven by a push system, would MySQL still be a valuable choice then?
I read about NoSQL data management and it seems that MongoDB would be a good addition to node.js.
My two questions:
Is there a reason I'm better off moving to a NoSQL system (which I need to learn from scratch) instead of MySQL (which I know already) in case of a real time web app?
Let's say that in MySQL:
I have a table called user (user_id_p, username)
I have a table called messages (message_id, message, user_id_f)
I want to make a single query to get all the messages associated with the username "omgtheykilledkenny".
Simple enough but how can I achieve that with MongoDB and its collections philosophy?
Thank you for your help.

Working with node.js/MongoDB is cool because Mongo's document structure is already JSONish, so you don't have to convert your queries to JSON. If you already know JavaScript, you have a headstart learning MongoDB. Mongo does scale for writes and reads pretty easily, the speed is pretty awesome, although I've seen some MySQL benchmarks on a single system that compare well to Mongo--it really shines when you start needing multiple boxes.
Assuming you have a separate messages collection, and you already know the id of the user you could just do: db.messages.find({user_id:ObjectId(...)});
Update: If you don't know the user id, then you need to do two queries, yes (unless you use an embedded array as recommended in the other answer--I would advise against that for this sort of use case, though, because you'll end up querying the entire document/list of messages even to display just a subset). Depending on your use case, obviously, if you have the username, you could also keep the user id handy, for situations like this. If it's client input giving the username that wouldn't work.
Update2: If you have unique usernames, you could make the username the _id for the users collection to avoid this issue. Most people would probably advise against this, and it has some definite drawbacks, such as making it harder to change a username.

You can't perform joins in MongoDB, so you can't achieve your second requirement. The Mongo way so do this would be either to nest messages within the user collection:
{ username: 'abc', messages: [...]}
Or use refId's, which is a kind of half-way house between joins and nested documents:
http://uk3.php.net/manual/en/class.mongodbref.php
In terms of switching from MySQL to Mongo, you don't necessarily need to ditch MySQL entirely. There are use cases where one is more appropriate than the other. You could use both for different parts of the system if it's appropriate to do so. Personally, I've used MySQL for a lot of things in the past, and I'm using MongoDB for a big project at the moment. I found the move very easy to make, because it's so easy to use the MongoDB driver, and the MongoDB site is very good for documentation on the whole.
You can convert to and from JSON with json_encode and json_decode from the front end, and you query and insert/update with arrays with MongoDB's PHP driver, so it's arguably more intuitive and easier to use than MySQL. It's just a question of getting used to it.

Related

How to migrate data from mongodb to mysql?

I am currently working on an application like to analitics, i has Angularjs app which communicates with Spring REST Client App from which user creates token(trackingID) and use generated script with this id putting on his website to collect information about visitor's actions through another Spring REST tracking App, for tracking app i am using as mongodb to collect visitor actions/visitor info for fast insertion, but for rest client app mysql with user/accounts details.
My question is how to migrate mongo data from tracking app to mysql maybe for getting posibility of join for easily and fastest way of analyze data with any kind of filters from angularjs client app, to create manually any workers that periodically will transfer data from last point to present state from mongo to mysql, or are any existed tools that can be setted for this transfer?
There is no official library to do this.
But you can use mongoexport feature from mongoDB to export it in a CSV format and mysqlimport to import them into MySQL.
Here are links to the documentation MySQL import and MongoDB Export.
One more method you can try to write a program in one of your favorite language and read from MongoDB and write into MySQL
MySQL 5.7 has a new JSON data type, that can be very convenient.
You can create a table at MySQL to receive the JSON messages AS IS, and then use SQL to query it or do a post processing to load the data in a structured set of database tables.
Check this out: https://dev.mysql.com/doc/refman/5.7/en/json.html
I realise this question is a few years old - but recently I've had a number of people enquiring whether a tool I developed (https://virtual.blue/apps/json-converter) can do exactly what the OP is asking (convert MongoDB to SQL) so I am guessing it is still something people want. Keep reading to find out why I am honestly not surprised by this.
The short answer to whether the tool can help you is: perhaps. If your existing data relationships are not too complicated, and your database is not enormous, it may well be worth a try.
However, I thought it might help to try and explain what the issues are with this kind of conversion, since all the answers I have seen so far are along the lines of "try tool X" or "first convert to format Y and then you can slurp it into MySQL using utility Z". ie there is no thought to whether what you get at the end of doing this is going to make sense in terms of data relationships and integrity.
For example, you could just stick your entire database dump in a single field of a single SQL table (ok space limitations might prevent this in reality, but hopefully you get my point). Then your database would be "in MySQL format", but it would be absolutely no use to anyone.
The point is, what you actually want is a fully defined database model, correctly encapsulating all of the intrinsic data relationships. ("Database normalization" as it is known.) If your conversion process gets those relationships wrong, then you have a broken model, and any queries you try to run over it are likely to return nonsense. Unfortunately there is no magic tool that is just going to "know" the best way to represent your data in MySQL, and closing your eyes and shovelling it into a bunch of random tools is unlikely to miraculously get you what you want.
And herein lies the fundamental problem with the "NoSQL" philosophy (fad). They sold people the bogus notion of "non-relational data". My first thought when I heard this was, "How does that work? Surely all data is relational?" By the looks of things we are steadily getting more and more evidence that my instincts were right. ("NoSQL? Why stop there? I go with 'NoDatabase'. It returns no results at all, but it sure is fast!")
The NoSQL madness throws several important fundamental engineering principles to the wind. We shouted "don't hard code!", "DRY!" (Don't Repeat Yourself) because these actions infuse inflexibility into systems. Traditional wisdom makes precisely the same flexibility argument when it advises "create a fully described model with all the data relationships represented". Then you can execute any arbitrary query over it and expect meaningful results. "Yes but there are a whole bunch of queries we are never going to need to run," says the NoSQL proponent. But surely we learnt our lesson on things we are "never going to need to do"? ("I hard code liberally, because I know I am never going to want to change my code." Hmm...)
The arguments about speed are largely moot. Say it turns out you are frequently doing a complex 9 table join, with unsurprisingly sluggish performance. So create an index. Cache it. Swap some disk space for speed. The NoSQL philosophy is to swap data integrity for speed, which makes no sense at all.
When you generate your fast lookup index (cache/table/map/whatever) what you are really doing is creating a view over your model. If your model changes, you can readily update your view. Going from a model to a view is easy - it's a one to many operation and you are on the right side of entropy.
However, when you went with MongoDB you effectively decided to create views without bothering to describe your fundamental model. Now you discover there are queries you want to run, but can't - and so it's no wonder you want to move over to SQL and actually have your data modelled correctly. The problem is you now want to go from a view to a model. Now you're on the wrong side of entropy. Your view is a lossy representation of the model's fundamental relationships. You can't expect a tool to "translate" your database, because you are asking it to insert new relationships which were not originally defined. These are real world relationships that are not machine-guessable. The tool cannot know what relationships were intended.
In short the only way you can do this reliably is to get your hands dirty. An intelligent human, with complete understanding of the system you are modelling needs to sit down and carefully come up with (possibly a substantial amount of) code which effectively picks through the data and resolves all of the insufficiently represented data relationships. If your data is complex then it's going to be a headache and there is no way to cheat.
If your data is still relatively simple then I would suggest making the conversion as soon as possible, before it becomes difficult. In this case my tool (https://virtual.blue/apps/json-converter) may be able to help.
(They really should have asked a Physicist before they came up with all this nonsense...!)
You can download a trial version of Studio 3T for Mongo and export your database to SQL (or JSON) directly

which database suits my application mysql or mongodb ? using Node.js , Backbone , Now.js

I want to make an application like docs.google.com (without its api,completely on my own server) using
frontend : backbone
backend : node
What database would u think is better ? mysql or mongodb ? Should support good scalability .
I am familiar with mysql with php and i will be happy if the answer is mysql.
But many tutorials i saw, they used mongodb, why did they use mongodb without mysql ?
What should i use ?
Can anyone give me link for some sample application(with source) build using backbone , Node , mysql (or mongo) . or atleast app. with Node and mysql
Thanks
With MongoDB, you can just store JSON objects and retrieve them fully-formed, so you don't really need an ORM layer and you spend less CPU time translating your data back-and-forth. The developers behind MongoDB have also made horizontally scaling the database a higher priority and let you run arbitrary Javascript code to pre-process data on the DB side (allowing map-reduce style filtering of data).
But you lose some for these gains: You can't join records. Actually, the JSON structure you store could only be done via joins in SQL, but in MongoDB you only have that one structure to your data, while in SQL you can query differently and get your data represented in alternate ways much easier, so if you need to do a lot of analytics on your database, MongoDB will make that harder.
The query language in MongoDB is "rougher", in my opinion, than SQL's, partly because it's less familiar, and partly because the querying features "feel" haphazardly put together, partially to make it valid JSON, and partially because there are literally a couple of ways of doing the same thing, and some are older ways that aren't as useful or regularly-formatted as the others. And there's the added complexity of the array and sub-object types over SQL's simple row-based design, so the syntax has to be able to handle querying for arrays that contain some of the values you defined, contain all of the values you defined, contain only the values you defined, and contain none of the values you defined. The same distinctions apply to object keys and their values, and this makes the query syntax harder to grasp. (And while I can see the need for edge-cases, the $where query parameter, which takes a javascript function that is run on every record of the data and returns a boolean, is a Siren song because you can easily define what objects you want to return or not, but it has to run on every record in the database, no indexes can be used.)
So, it depends on what you want to do, but since you say it's for a Google Docs clone, you probably don't care about any representation but the document representation, itself, and you're probably only going to query based on document ID, document name, or the owner's ID/name, nothing too complex in the querying.
Then, I'd say being able to take the JSON representation of the document your user is editing, and just throw it into the database and have it automatically index these important fields, is worth the price of learning a new database.
I was also struggling with this choice looking at the hype created by using MongoDB for tasks it was not built for. So my 2 cents are:
Storing and retrieving hierarchical objects, that your documents probably are, is easier in MongoDB, as David says. It becomes more complicated if you want to store documents that are bigger than 16Mb though - MongoDB's answer is GridFS.
Organising documents in folders, groups, keeping track of which user owns which documents and who he/she provided access to them is definitely easier with MySQL - you have the advantage of powerful SQL queries with joins etc., built in EXPLAIN optimization, triggers, functions, stored procedures, etc. MongoDB is nowhere near.
So what prevents you from using both MySQL to organize the documents and MongoDB to store one collection of documents identified by id (or several collections - one for each document type)? It seems to me the best choice and using two databases in one application is not a problem, really.
MySQL will store users, groups, folders, permissions - whatever you fancy - and for each document it will store a reference to the collection and the document id (MongoDB has a special format for it - DBRefs). MongoDB will store documents themselves in collections, if they are all less than 16MB, or the previews and metadata of documents in collections and the whole documents in GridFS.
David provided a good answer. A few things to add to it.
MongoDB's flexible nature permits for easy agile / iterative development.
MongoDB like node.js is asyncronous in nature and works very well within asyncronous environments.
Mongoose is a good ODM (object document mapper) that makes working with MongoDB with Node.js feel very natural. Unlike ORMs this is a very thin layer.
For Google Doc like functionality, the flexibility & very rich data structure provided by MongoDB feels like a much better fit.
You can find some good example posts by searching for mongoose, node and MongoDB.
Here's one that also uses backbone.js and looks good http://mattkopala.com/blog/2012/02/12/getting-started-with-nodejs/

combination mysql mongodb

I am building a web application that requires to be scalable. In a nutshell:
We got users, users have friends, so they got a friendlist. Users can create messages, and messages from your friends are displayed on the homepage, each message is linked to a location and these messages can be filtered by date, for example I want to display all the messages from my friends that where posted yesterday, or display me all messages from location X.
I am now building the application fully in MongoDb, however I am heading into trouble atm. For example:
On the mainpage, we got the message list of the friends of the users, no problem we use:
$db->messages->find(array('users._id' => array('$in' => $userFriendListGoesHere)));
So then we got our messages, however after that, each message has a location, so I have to make a loop through all messages, and get the location from another collection, and also multiple users can be bound to a single message, so we also have to get all the user data from another collection, in MySql simply a join query, in MongoDb 2 loops, and this is my first question: is this a problem? Does this require alot of resources, the looping?
So my idea is to split up with MySql and MongoDb, I use MongoDb to store all the locations (since it are over 350.000+ locations and use lat long calculations) and MySql for the message, users and friends of the users, so second question, can you help me with my decision, should I keep using MongoDb with the loops? Or use a combination?
Thanks for reading and your time.
.. in MySql simply a join query, in MongoDb 2 loops, and this is my first question: is this a problem?
This is par for the course with MongoDB, in fact, it's a core MongoDB trade-off.
MongoDB is based on the precept that joins do not scale. So it has no joins and leaves you to "roll your own". Some libraries like Morphia (for Java) provide built-in logic for loading references.
PHP has the Doctrine project, which should help with some of this.
Does this require alot of resources, the looping?
Kind of? This will really depend on implementation.
It's obviously going to involve a bunch of back and forth with the DB, but it may be less network traffic than the SQL version. You will need memory space for all of the data coming back. But again, that's not terribly different from SQL.
Really, it's up to you to make all of the trade-offs about how this is implemented and who is keeping what in memory.
should I keep using MongoDb with the loops
MongoDB is a great idea when your data is not inherently relational.
In the example you provided, it kinda seems like your data is relational. MySQL and other relational DBs (such as Postgres) are better data stores than MongoDB for relational data. This blog post covers this topic in more detail.
In summary, I'd recommend the following:
Please spend some time analyzing whether your data is inherently relational or not.
If it is not, then MongoDB can give you benefits over using MySQL.
If it is relational, then MySQL is the better solution.
Using both is, of course, possible - but it will create additional work & complexity for you. In the long term - is that worth the effort? Only you will know the answer.
Best of luck with your web app!

Which is the right database for the job?

I am working on a feature and could use opinions on which database I should use to solve this problem.
We have a Rails application using MySQL. We have no issues with MySQL and it runs great. But for a new feature, we are deciding whether to stay MySQL or not. To simplify the problem, let's assume there is a User and Message model. A user can create messages. The message is delivered to other users based on their association with the poster.
Obviously there is an association based on friendship but there are many many more associations based on the user's profile. I plan to store some metadata about the poster along with the message. This way I don't have to pull the metadata each time when I query the messages.
Therefore, a message might look like this:
{
id: 1,
message: "Hi",
created_at: 1234567890,
metadata: {
user_id: 555,
category_1: null,
category_2: null,
category_3: null,
...
}
}
When I query the messages, I need to be able to query based on zero or more metadata attributes. This call needs to be fast and occurs very often.
Due to the number of metadata attributes and the fact any number can be included in a query, creating SQL indexes here doesn't seem like a good idea.
Personally, I have experience with MySQL and MongoDB. I've started research on Cassandra, HBase, Riak and CouchDB. I could use some help from people who might have done the research as to which database is the right one for my task.
And yes, the messages table can easily grow into millions or rows.
This is a very open ended question, so all we can do is give advice based on experience. The first thing to consider is if it's a good idea to decide on using something you haven't used before, instead of using MySQL, which you are familiar with. It's boring not to use shiny new things when you have the opportunity, but believe me that it's terrible when you've painted yourself in a corner because you though that the new toy would do everything it said on the box. Nothing ever works the way it says in the blog posts.
I mostly have experience with MongoDB. It's a terrible choice unless you want to spend a lot of time trying different things and realizing they don't work. Once you scale up a bit you basically can't use things like secondary indexes, updates, and other things that make Mongo an otherwise awesomely nice tool (most of this has to do with its global write lock and the database format on disk, it basically sucks at concurrency and fragments really easily if you remove data).
I don't agree that HBase is out of the question, it doesn't have secondary indexes, but you can't use those anyway once you get above a certain traffic load. The same goes for Cassandra (which is easier to deploy and work with than HBase). Basically you will have to implement your own indexing which ever solution you choose.
What you should consider is things like if you need consistency over availability, or vice versa (e.g. how bad is it if a message is lost or delayed vs. how bad is it if a user can't post or read a message), or if you will do updates to your data (e.g. data in Riak is an opaque blob, to change it you need to read it and write it back, in Cassandra, HBase and MongoDB you can add and remove properties without first reading the object). Ease of use is also an important factor, and Mongo is certainly easy to use from the programmer's perspective, and HBase is horrible, but just spend some time making your own library that encapsulates the nasty stuff, it will be worth it.
Finally, don't listen to me, try them out and see how they perform and how it feels. Make sure you try to load it as hard as you can, and make sure you test everything you will do. I've made the mistake of not testing what happens when you remove lots of data in MongoDB, and have paid for that dearly.
I would recommend to look at presentation about Why databases suck for messaging which is mainly targeted on the fact why you shouldn't use databases such as MySQL for messaging.
I think in this scenario CouchDB's changes feed may come quite handy although you probably would also have to create some more complex views based on querying message metadata. If speed is critical try to also look at redis which is really fast and comes with pub/sub functionality. MongoDB with it's ad hoc queries support may also be a decent solution for this use case.
I think you're spot-on in storing metadata along with each message! Sacrificing storage for faster retrieval time is probably the way to go. Note that it could get complicated if you ever need to change a user's metadata and propagate that to all the messages. You should consider how often that might happen, whether you'll actually need to update all the message records, and based on that whether it's worth paying the price for the sake of less queries (it probably is worth it, but that depends on the specifics of your system).
I agree with #Andrej_L that Hbase isn't the right solution for this problem. Cassandra falls in with it for the same reason.
CouchDB could solve your problem, but you're going to have to define views (materialized indices) for any metadata you're going to want to query. If the whole point of not using MySQL here is to avoid indexing everything, then Couch is probably not the right solution either.
Riak would be a much better option since it queries your data using map-reduce. That allows you to build any query you like without the need to pre-index all your data as in couch. Millions of rows are not a problem for Riak - no worries there. Should the need arise, it also scales very well by simply adding more nodes (and it can balance itself too, so this is really a non-issue).
So based on my own experience, I'd recommend Riak. However, unlike you, I've no direct experience with MongoDB so you'll have to judge it agains Riak yourself (or maybe someone else here can answer on that).
From my experience with Hbase is not good solution for your application.
Because:
Doesn't contain secondary index by default(you should install plugins or something like these). So you can effectively search only by primary key. I have implemented secondary index using hbase and additional tables. So you can't use this one in online application because of for getting result you should run map/reduce job and it will take much time on million data.
It's very difficult to support and adjust this db. For effective work you will use HBAse with Hadoop and it's necessary powerful computers or several ones.
Hbase is very useful when you need make aggregation reports on big amount of data. It seems that you needn't.
Due to the number of metadata attributes and the fact any number can
be included in a query, creating SQL indexes here doesn't seem like a
good idea.
It sounds like you need a join, so you can mostly forget about CouchDB till they sort out the multiview code that was worked on (not actually sure it is still worked on).
Riak can query as fast as you make it, depends on the nodes
Mongo will let you create an index on any field, even if that is an array
CouchDB is very different, it builds indexes using a stored Map-Reduce(but without the reduce) they call a "view"
RethinkDB will let you have SQL but a little faster
TokuDB will too
Redis will kill all in speed, but it's entirely stored in RAM
single level relations can be done in all of them, but differently for each.

Using MongoDB and MySQL in unison

Some parts of my web app would work very well with a RDBMS, such as user and URL handling - I want to normalize users, emails, hosts (ie stackoverflow.com), and urls (ie https://stackoverflow.com/questions/ask) so that updating things in one place update things in all places and to minimize redundancy.
But some parts of my web app would very well with a document-based database, like Mongo, because they have a lot of components that would work more efficiently as embedded objects.
Would it make sense to use MySQL for the relational objects and Mongo for the document objects, or would it be not worth the hassle to have to manage two types of databases? I know that Mongo has references, but I get the idea that it is not really designed and optimized for references.
Thanks!
PS: I read this: Using combination of MySQL and MongoDB and it scratches the edge of what I am asking, but it is really a completely different question.
We use Mongo and MySQL in unision. Yes there is additional maintenance involved but it is about using the right tool for the right job. We use Mongo for a more real-time scenario where we need fast reads and writes and can do without persisting data for long periods of time. MySQL for everything else.
That being said, your needs may be unique and you need to figure out the right tool for the job.
I recently built a system using MySql for as the RDBMS managing users and blogging and MongoDB for searchable attributes. It works well however keeping data in sync, especially user Id's etc requires a bit of work. It is a case of basically choosing the right tool for the job.