Where to use MongoDB and where MySQL? - mysql

I am thinking about using one of two databases - MySQL and MongoDB. I am planning to storing text and numeric data and I will building my app in RoR.
So I don't know, which database system could be better for this purpose - can you help me, please, under which criterium I will decide?

Let me cast this question within more general setting and into some historical perspective.
In the 60s they were asking whether to use hierarchical or network database
In the 70s the debate was relational against network
In the 80s Relational turned into SQL databases, so question mutated to SQL vs. network
In the 90s it was SQL against object databases
In 00s it was SQL against XML databases
Today we have SQL vs. NoSQL
Do you see a pattern here? Would you still bet some money onto SQL competitor, especially if it's nothing more than glorified hash table?

I have used also MySQL and MongoDB with Mongoid in my projects, and I can say that if you want to keep binary data like images, mp3s and other stuff in your database so try Mongo, for other reasons you can use SQL databases. MongoDB has no structure - you processing the hash, so you can dynamicly add and remove keys/columns.
In your case I would use MySQL.

In my opinion you should base your decision on the purpose of your application. Do you want to search through your text data, how will you define keys. There is little use in going for MySQL if you have to request each record and scan it. Even if there is functionality to do text scans in MySQL (does it have that?) MongoDB will probably do the job more efficiently. The other way around, if you are not going to use MongoDB's strong points then you might as well go for MySQL.
Another factor might be the deadline for implementing something. If you need it fast, don't waste time on learning something new. If you have time to experiment, figure out the key features you will most likely rely upon in your application.

I think, if you need a hard structure you should use MySQL because it't its nature, but if you need something more dynamic, whith no structure at all (schema-less) you should use MongoDB, I've never use MongoDB but I know it's more object/document oriented.

It would be helpful if you could provide some more detail. Would your data easily fit into a schema, or do you need the flexibility that a document store offers? What about auto-sharding, etc? Without more information, no one can give you advice that fits your needs. Lacking that, you can't hope for feedback any better than people's personal preferences, which is little more than a flamewar waiting to happen.

Related

How to migrate data from mongodb to mysql?

I am currently working on an application like to analitics, i has Angularjs app which communicates with Spring REST Client App from which user creates token(trackingID) and use generated script with this id putting on his website to collect information about visitor's actions through another Spring REST tracking App, for tracking app i am using as mongodb to collect visitor actions/visitor info for fast insertion, but for rest client app mysql with user/accounts details.
My question is how to migrate mongo data from tracking app to mysql maybe for getting posibility of join for easily and fastest way of analyze data with any kind of filters from angularjs client app, to create manually any workers that periodically will transfer data from last point to present state from mongo to mysql, or are any existed tools that can be setted for this transfer?
There is no official library to do this.
But you can use mongoexport feature from mongoDB to export it in a CSV format and mysqlimport to import them into MySQL.
Here are links to the documentation MySQL import and MongoDB Export.
One more method you can try to write a program in one of your favorite language and read from MongoDB and write into MySQL
MySQL 5.7 has a new JSON data type, that can be very convenient.
You can create a table at MySQL to receive the JSON messages AS IS, and then use SQL to query it or do a post processing to load the data in a structured set of database tables.
Check this out: https://dev.mysql.com/doc/refman/5.7/en/json.html
I realise this question is a few years old - but recently I've had a number of people enquiring whether a tool I developed (https://virtual.blue/apps/json-converter) can do exactly what the OP is asking (convert MongoDB to SQL) so I am guessing it is still something people want. Keep reading to find out why I am honestly not surprised by this.
The short answer to whether the tool can help you is: perhaps. If your existing data relationships are not too complicated, and your database is not enormous, it may well be worth a try.
However, I thought it might help to try and explain what the issues are with this kind of conversion, since all the answers I have seen so far are along the lines of "try tool X" or "first convert to format Y and then you can slurp it into MySQL using utility Z". ie there is no thought to whether what you get at the end of doing this is going to make sense in terms of data relationships and integrity.
For example, you could just stick your entire database dump in a single field of a single SQL table (ok space limitations might prevent this in reality, but hopefully you get my point). Then your database would be "in MySQL format", but it would be absolutely no use to anyone.
The point is, what you actually want is a fully defined database model, correctly encapsulating all of the intrinsic data relationships. ("Database normalization" as it is known.) If your conversion process gets those relationships wrong, then you have a broken model, and any queries you try to run over it are likely to return nonsense. Unfortunately there is no magic tool that is just going to "know" the best way to represent your data in MySQL, and closing your eyes and shovelling it into a bunch of random tools is unlikely to miraculously get you what you want.
And herein lies the fundamental problem with the "NoSQL" philosophy (fad). They sold people the bogus notion of "non-relational data". My first thought when I heard this was, "How does that work? Surely all data is relational?" By the looks of things we are steadily getting more and more evidence that my instincts were right. ("NoSQL? Why stop there? I go with 'NoDatabase'. It returns no results at all, but it sure is fast!")
The NoSQL madness throws several important fundamental engineering principles to the wind. We shouted "don't hard code!", "DRY!" (Don't Repeat Yourself) because these actions infuse inflexibility into systems. Traditional wisdom makes precisely the same flexibility argument when it advises "create a fully described model with all the data relationships represented". Then you can execute any arbitrary query over it and expect meaningful results. "Yes but there are a whole bunch of queries we are never going to need to run," says the NoSQL proponent. But surely we learnt our lesson on things we are "never going to need to do"? ("I hard code liberally, because I know I am never going to want to change my code." Hmm...)
The arguments about speed are largely moot. Say it turns out you are frequently doing a complex 9 table join, with unsurprisingly sluggish performance. So create an index. Cache it. Swap some disk space for speed. The NoSQL philosophy is to swap data integrity for speed, which makes no sense at all.
When you generate your fast lookup index (cache/table/map/whatever) what you are really doing is creating a view over your model. If your model changes, you can readily update your view. Going from a model to a view is easy - it's a one to many operation and you are on the right side of entropy.
However, when you went with MongoDB you effectively decided to create views without bothering to describe your fundamental model. Now you discover there are queries you want to run, but can't - and so it's no wonder you want to move over to SQL and actually have your data modelled correctly. The problem is you now want to go from a view to a model. Now you're on the wrong side of entropy. Your view is a lossy representation of the model's fundamental relationships. You can't expect a tool to "translate" your database, because you are asking it to insert new relationships which were not originally defined. These are real world relationships that are not machine-guessable. The tool cannot know what relationships were intended.
In short the only way you can do this reliably is to get your hands dirty. An intelligent human, with complete understanding of the system you are modelling needs to sit down and carefully come up with (possibly a substantial amount of) code which effectively picks through the data and resolves all of the insufficiently represented data relationships. If your data is complex then it's going to be a headache and there is no way to cheat.
If your data is still relatively simple then I would suggest making the conversion as soon as possible, before it becomes difficult. In this case my tool (https://virtual.blue/apps/json-converter) may be able to help.
(They really should have asked a Physicist before they came up with all this nonsense...!)
You can download a trial version of Studio 3T for Mongo and export your database to SQL (or JSON) directly

How to handle ever changing database structure

I am working on my masters thesis. For my implementation I have some MySQL tables.
With every iteration my table structure will differ (adding, removing columns etc). I was wondering what the best way is to handle the ever changing structure, without changing old code too much.
I read that Facebook has a version control system where the can specify exactly what kind of code/feature is available and for what user. As far as I know that must mean that they manage many different database structures at once. How does their old code work along side their new code with respect to their database? Do they do a lot of testing? Did they abandon MySQL all together?
Personally I like FriendFeeds Solution a lot. However I am wondering if it is too much for me.
Why anyone would try to use a relational database for non-relational data.
Forget about FriendFied and take a look at NoSQL solutions. They are schemaless, they support horizontal scalability much better than any RDBS and most of them are free/open source.
I can recommend MongoDB. It's very fast, written in C++, but no ACID complaint.
Also you could try RavenDB. It's not as fast as MongoDB and inserts are very slow compared to Mongo, but it's ACID complaint. Written in .NET.

Which is the right database for the job?

I am working on a feature and could use opinions on which database I should use to solve this problem.
We have a Rails application using MySQL. We have no issues with MySQL and it runs great. But for a new feature, we are deciding whether to stay MySQL or not. To simplify the problem, let's assume there is a User and Message model. A user can create messages. The message is delivered to other users based on their association with the poster.
Obviously there is an association based on friendship but there are many many more associations based on the user's profile. I plan to store some metadata about the poster along with the message. This way I don't have to pull the metadata each time when I query the messages.
Therefore, a message might look like this:
{
id: 1,
message: "Hi",
created_at: 1234567890,
metadata: {
user_id: 555,
category_1: null,
category_2: null,
category_3: null,
...
}
}
When I query the messages, I need to be able to query based on zero or more metadata attributes. This call needs to be fast and occurs very often.
Due to the number of metadata attributes and the fact any number can be included in a query, creating SQL indexes here doesn't seem like a good idea.
Personally, I have experience with MySQL and MongoDB. I've started research on Cassandra, HBase, Riak and CouchDB. I could use some help from people who might have done the research as to which database is the right one for my task.
And yes, the messages table can easily grow into millions or rows.
This is a very open ended question, so all we can do is give advice based on experience. The first thing to consider is if it's a good idea to decide on using something you haven't used before, instead of using MySQL, which you are familiar with. It's boring not to use shiny new things when you have the opportunity, but believe me that it's terrible when you've painted yourself in a corner because you though that the new toy would do everything it said on the box. Nothing ever works the way it says in the blog posts.
I mostly have experience with MongoDB. It's a terrible choice unless you want to spend a lot of time trying different things and realizing they don't work. Once you scale up a bit you basically can't use things like secondary indexes, updates, and other things that make Mongo an otherwise awesomely nice tool (most of this has to do with its global write lock and the database format on disk, it basically sucks at concurrency and fragments really easily if you remove data).
I don't agree that HBase is out of the question, it doesn't have secondary indexes, but you can't use those anyway once you get above a certain traffic load. The same goes for Cassandra (which is easier to deploy and work with than HBase). Basically you will have to implement your own indexing which ever solution you choose.
What you should consider is things like if you need consistency over availability, or vice versa (e.g. how bad is it if a message is lost or delayed vs. how bad is it if a user can't post or read a message), or if you will do updates to your data (e.g. data in Riak is an opaque blob, to change it you need to read it and write it back, in Cassandra, HBase and MongoDB you can add and remove properties without first reading the object). Ease of use is also an important factor, and Mongo is certainly easy to use from the programmer's perspective, and HBase is horrible, but just spend some time making your own library that encapsulates the nasty stuff, it will be worth it.
Finally, don't listen to me, try them out and see how they perform and how it feels. Make sure you try to load it as hard as you can, and make sure you test everything you will do. I've made the mistake of not testing what happens when you remove lots of data in MongoDB, and have paid for that dearly.
I would recommend to look at presentation about Why databases suck for messaging which is mainly targeted on the fact why you shouldn't use databases such as MySQL for messaging.
I think in this scenario CouchDB's changes feed may come quite handy although you probably would also have to create some more complex views based on querying message metadata. If speed is critical try to also look at redis which is really fast and comes with pub/sub functionality. MongoDB with it's ad hoc queries support may also be a decent solution for this use case.
I think you're spot-on in storing metadata along with each message! Sacrificing storage for faster retrieval time is probably the way to go. Note that it could get complicated if you ever need to change a user's metadata and propagate that to all the messages. You should consider how often that might happen, whether you'll actually need to update all the message records, and based on that whether it's worth paying the price for the sake of less queries (it probably is worth it, but that depends on the specifics of your system).
I agree with #Andrej_L that Hbase isn't the right solution for this problem. Cassandra falls in with it for the same reason.
CouchDB could solve your problem, but you're going to have to define views (materialized indices) for any metadata you're going to want to query. If the whole point of not using MySQL here is to avoid indexing everything, then Couch is probably not the right solution either.
Riak would be a much better option since it queries your data using map-reduce. That allows you to build any query you like without the need to pre-index all your data as in couch. Millions of rows are not a problem for Riak - no worries there. Should the need arise, it also scales very well by simply adding more nodes (and it can balance itself too, so this is really a non-issue).
So based on my own experience, I'd recommend Riak. However, unlike you, I've no direct experience with MongoDB so you'll have to judge it agains Riak yourself (or maybe someone else here can answer on that).
From my experience with Hbase is not good solution for your application.
Because:
Doesn't contain secondary index by default(you should install plugins or something like these). So you can effectively search only by primary key. I have implemented secondary index using hbase and additional tables. So you can't use this one in online application because of for getting result you should run map/reduce job and it will take much time on million data.
It's very difficult to support and adjust this db. For effective work you will use HBAse with Hadoop and it's necessary powerful computers or several ones.
Hbase is very useful when you need make aggregation reports on big amount of data. It seems that you needn't.
Due to the number of metadata attributes and the fact any number can
be included in a query, creating SQL indexes here doesn't seem like a
good idea.
It sounds like you need a join, so you can mostly forget about CouchDB till they sort out the multiview code that was worked on (not actually sure it is still worked on).
Riak can query as fast as you make it, depends on the nodes
Mongo will let you create an index on any field, even if that is an array
CouchDB is very different, it builds indexes using a stored Map-Reduce(but without the reduce) they call a "view"
RethinkDB will let you have SQL but a little faster
TokuDB will too
Redis will kill all in speed, but it's entirely stored in RAM
single level relations can be done in all of them, but differently for each.

XML or MYSQL.Which should be used for storing connected data?

i am writing code for friend list and messaging system for my college website.I need to store interconnected data.. need to search them ...It has about 3500 records..So which way I proceed MYSQL or XML ..which is fastest..which is best ?why?
I'm going to use one of my professor's favorite answers here: "it depends."
XML and MySQL have very different applications. If you need to be doing lots of simultaneous queries for all sorts of sophisticated things, MySQL is your clear winner. Sometimes MySQL can be hard to use in some applications because you must first create a database schema in which to fit your data. It sounds like though, that you have many records with the same structure, and it would be easy enough to throw them into a database. With a SQL based database engine like MySQL, you can also construct queries using the standard SQL language. Database optimizations can also help to increase the performance of these types of queries, for example, you can used indexes and keys. If your data needs to be updated regularly, than MySQL will likely provide better performance as it will not have to rewrite the XML file. If you need your application to scale to many simultaneous connections of sophisticated queries, you are definitely going to want to go with some sort of SQL solution.
Depending upon your application though, sometimes there are other ways to store and access your data. I for one once needed to create a persistent data structure on the disk which could be accessed very quickly, but never updated. For that, I used cdb. There are also other database systems out there like the Berkeley database, and some No-SQL solutions such as couchdb and mongodb. I posed a somewhat interesting question here on stackoverflow on the use of No-SQL solutions a little while back which you may find interesting as well.
This is really just a sampling of different considerations you may want to make when you are choosing how you want to store your data. Think about questions like: How frequently will things be queried? or updated? What will your queries look like? What kinds of applications do you need to access your information from? etc.

What kinds of data queries are too hard to do on CouchDB (as opposed to SQL)? Seeking concrete examples

I think CouchDB is really cool and want to use it more. But I'd also like to know ahead of time whether there are any types of data query that are done easily on MySQL but are impossible or very awkward to accomplish in CouchDB.
Please answer with concrete answers or examples instead of just saying that "CouchDB is for documents and MySQL is for relational data." I don't really know what that statement means, since it seems that you can do things functionally equivalent to relational MySQL joins with CouchDB views.
For example, I've read that paginating through a data set is a bit awkward in CouchDB. This is the sort of answer I'm looking for.
A problem I'm having at the moment is displaying an AJAX grid with contents from a CouchDB database. The equivalent SQL request would be:
SELECT * FROM the_table
WHERE {filter_col} = {filter_value} [ AND ... ]
ORDER BY {order_col}
LIMIT {n} OFFSET {m}
It's a pretty simple request to run on a traditional SQL database, but having to perform filtering, ordering and paging all together at the same time is beyond what CouchDB indexing can manage - at least, without creating an insane number of different views.
Couchdb is having hard time with full-text searches (unless external software is used), although mysql isn't particularly good at that, couch is still even worse.
Couchdb isn't going to do a good job when your data model implies multiple and complex relations between objects, after all, it's a document-based system, not relational dbms.
Other than that, IMO couch rules.
EDIT: Particularly when you need to relax, of course! :)
It all depends on the motivation behind changing data stores. What problem or architectural challenge are you trying to overcome with MySQL that CouchDB can solve? If at the end of the day there is no difference in functionality or performance then the refactoring to change database platforms cannot be justified.
Have a look at some ORM frameworks, which if implemented correctly can let you swap out the back end databases easily.