MongoDB or MySQL - mysql

I am working on a website which would be having all the restaurant related details for a particular country. I was considering which DB would be best suitable for this kind of scenario,very similar to this.
I was considering to use MongoDB just because it would provide me with flexible schema and Simple queries for data retrieval. I am rethinking over my decision as neither my data is going to be too large as of my now so there wont be nay blockage for me w.r.t data size in MySQL.
What would be best way to choose between the 2.

It depends truly on whether you want data integrity and ACID features of a Relational Database. If a Relational Database is built correctly using the Relational Model and E.F Codd's rules, you will never have a problem with having duplicate data, inconsistencies, and other maladies.(This is assuming you use a RDBMS that is worth its salt, like oracle or SAP ASE)
However, you also have the option of MongoDB, which as you pointed out, is very flexible. However, through my experience, you will have to do a lot more manual work ensuring data accuracy and integrity.
However, certain things are easier in it, and it is by no means not successful. I use Mongo as a data back end for simulation servers I run, and it performs beautifully. Where mongo truly exceeds is with its atomic documents, and that's where Mongo pulled ahead of other NoSQL systems like CouchDB.
What it truly comes down to is what kind of data you are storing. If you are storing relational data, use a RDBMS. If it is more document based, use Mongo or a similar data storage engine. I do not like the idea of choosing a data storage engine by what is popular or what is new. Use what fits your data.
I hope this answers you question satisfactorily, if not please comment below.

Related

Store JSON data as Text in MySQL

This is more of a concept/database architecture related question. In order to maintain data consistency, instead of a NoSQL data store, I'm just storing JSON objects as strings/Text in MySQL. So a MySQL row will look like this
ID, TIME_STAMP, DATA
I'll store JSON data in the DATA field. I won't be updating any rows, instead I'll add new rows with the current time stamp. So, when I want the latest data I just fetch the row with the max(timestamp). I'm using Tornado with the Python MySQLDB driver as my primary backend application.
I find this approach very straight forward and less prone to errors. The JSON objects are fairly simple and are not nested heavily.
Is this approach fundamentally wrong ? Are there any issues with storing JSON data as Text in MySQL or should I use a file system based storage such as HDFS. Please let me know.
MySQL, as you probably know, is a relational database manager. It is designed for being used in a way where data is related to each other through keys, forming relations which can then be used to yield complex retrieval of data. Your method will technically work (and be quite fast), but will probably (based on what I've seen so far) considerably impair your possibility of leveraging the technology you're using, should you expand the complexity of your scope!
I would recommend you use a database like Redis or MongoDB as they are designed for document storage rather than relational architectures.
That said, if you find the approach works fine for what you're building, just go ahead. You might face some blockers up ahead if you want to add complexity to your solution but either way, you'll learn something new! Good luck!
Pradeeb, to help answer your question you need to analyze your use case. What kind of data are you storing? For me, this would be the deciding factor: every technology has its specific use case where it excels at.
I think it is safe to assume that you use JSON since your data structure needs to very flexible documents, compared to a traditional relational DB. There are certain data stores that natively support such data structures, such as MongoDB (they call it "binary JSON" or BSON) as Phil pointed out. This would give you improved storage and/or improved search capabilities. Again, the utility depends entirely on your use case.
If you are looking for something like a job queue and horizontal scalability is not an issue and you just need fast access of the latest you could use RedisDB, an in-memory key value store, that has a hash (associative array) data type and lists for this kind of thing. Alternatively, since you mentioned HDFS and horizontal scalability may very well be an issue, I can recommend using queue systems like Apache ActiveMQ or RabbitMQ.
Lastly, if you are writing heavily, and your are not client limited but your data storage is your bottle neck: look into distributed, flexible-schema data storage like HBase or Cassandra. They offer flexible data schemas, are heavily write optimized, and data can be appended and remains in chronological order, so you can fetch the newest data efficiently.
Hope that helps.
This is not a problem. You can also use memcached storage engine in modern MySQL which would be perfect. Although I have never tried that.
Another approach is to use memcached as cache. Write everything to both memcached, and also mysql. When you go to read data, try reading from memcached. If it does not exist, read from mysql. This is a common technique to reduce database bottleneck.

Using MongoDB vs MySQL with lots of JSON fields?

There is a microblogging type of application. Two main basic database stores zeroed upon are:
MySQL or MongoDB.
I am planning to denormalize lot of data I.e. A vote done on a post is stored in a voting table, also a count is incremented in the main posts table. There are other actions involved with the post too (e.g. Like, vote down).
If I use MySQL, some of the data better suits as JSON than fixed schema, for faster lookups.
E.g.
POST_ID | activity_data
213423424 | { 'likes': {'count':213,'recent_likers' :
['john','jack',..fixed list of recent N users]} , 'smiles' :
{'count':345,'recent_smilers' :
['mary','jack',..fixed list of recent N users]} }
There are other components of the application as well, where usage of JSON is being proposed.
So, to update a JSON field, the sequence is:
Read the JSON in python script.
Update the JSON
Store the JSON back into MySQL.
It would have been single operation in MongoDB with atomic operations like $push,$inc,$pull etc. Also
document structure of MongoDB suits my data well.
My considerations while choosing the data store.
Regarding MySQL:
Stable and familiar.
Backup and restore is easy.
Some future schema changes can be avoided using some fields as schemaless JSON.
May have to use layer of memcached early.
JSON blobs will be static in some tables like main Posts, however will be updated alot in some other tables like Post votes and likes.
Regarding MongoDB:
Better suited to store schema less data as documents.
Caching might be avoided till a later stage.
Sometimes the app may become write intensive, MongoDB can perform better at those points where unsafe writes are not an issue.
Not sure about stability and reliability.
Not sure about how easy is it to backup and restore.
Questions:
Shall we chose MongoDB if half of data is schemaless, and is being stored as JSON if using MySQL?
Some of the data like main posts is critical, so it will be saved using safe writes, the counters etc
will be saved using unsafe writes. Is this policy based on importance of data, and write intensiveness correct?
How easy is it to monitor, backup and restore MongoDB as compared to MySQL? We need to plan periodic backups ( say daily ), and restore them with ease in case of disaster. What are the best options I have with MongoDB to make it a safe bet for the application.
Stability, backup, snapshots, restoring, wider adoption I.e.database durability are the reasons pointing me
to use MySQL as RDBMS+NoSql even though a NoSQL document storage could serve my purpose better.
Please focus your views on the choice between MySQL and MongoDB considering the database design I have in mind. I know there could be better ways to plan database design with either RDBMS or MongoDB documents. But that is not the current focus of my question.
UPDATE : From MySQL 5.7 onwards, MySQL supports a rich native JSON datatype which provides data flexibility as well as rich JSON querying.
https://dev.mysql.com/doc/refman/5.7/en/json.html
So, to directly answer the questions...
Shall we chose mongodb if half of data is schemaless, and is being stored as JSON if using MySQL?
Schemaless storage is certainly a compelling reason to go with MongoDB, but as you've pointed out, it's fairly easy to store JSON in a RDBMS as well. The power behind MongoDB is in the rich queries against schemaless storage.
If I might point out a small flaw in the illustration about updating a JSON field, it's not simply a matter of getting the current value, updating the document and then pushing it back to the database. The process must all be wrapped in a transaction. Transactions tend to be fairly straightforward, until you start denormalizing your database. Then something as simple as recording an upvote can lock tables all over your schema.
With MongoDB, there are no transactions. But operations can almost always be structured in a way that allow for atomic updates. This usually involves some dramatic shifts from the SQL paradigms, but in my opinion they're fairly obvious once you stop trying to force objects into tables. At the very least, lots of other folks have run into the same problems you'll be facing, and the Mongo community tends to be fairly open and vocal about the challenges they've overcome.
Some of the data like main posts is critical , so it will be saved using safe writes , the counters etc will be saved using unsafe writes. Is this policy based on importance of data, and write intensiveness correct?
By "safe writes" I assume you mean the option to turn on an automatic "getLastError()" after every write. We have a very thin wrapper over a DBCollection that allows us fine grained control over when getLastError() is called. However, our policy is not based on how "important" data is, but rather whether the code following the query is expecting any modifications to be immediately visible in the following reads.
Generally speaking, this is still a poor indicator, and we have instead migrated to findAndModify() for the same behavior. On the occasion where we still explicitly call getLastError() it is when the database is likely to reject a write, such as when we insert() with an _id that may be a duplicate.
How easy is it to monitor,backup and restore Mongodb as compared to mysql? We need to plan periodic backups (say daily), and restore them with ease in case of disaster. What are the best options I have with mongoDb to make it a safe bet for the application?
I'm afraid I can't speak to whether our backup/restore policy is effective as we have not had to restore yet. We're following the MongoDB recommendations for backing up; #mark-hillick has done a great job of summarizing those. We're using replica sets, and we have migrated MongoDB versions as well as introduced new replica members. So far we've had no downtime, so I'm not sure I can speak well to this point.
Stability,backup,snapshots,restoring,wider adoption i.e.database durability are the reasons pointing me to use MySQL as RDBMS+NoSql even though a NoSQL document storage could serve my purpose better.
So, in my experience, MongoDB offers storage of schemaless data with a set of query primitives rich enough that transactions can often be replaced by atomic operations. It's been tough to unlearn 10+ years worth of SQL experience, but every problem I've encountered has been addressed by the community or 10gen directly. We have not lost data or had any downtime that I can recall.
To put it simply, MongoDB is hands down the best data storage ecosystem I have ever used in terms of querying, maintenance, scalability, and reliability. Unless I had an application that was so clearly relational that I could not in good conscience use anything other than SQL, I would make every effort to use MongoDB.
I don't work for 10gen, but I'm very grateful for the folks who do.
I'm not going to comment on the comparisons (I work for 10gen and don't feel it's appropriate for me to do so), however, I will answer the specific MongoDB questions so that you can better make your decision.
Back-Up
Documentation here is very thorough, covering many aspects:
Block-Level Methods (LVM makes it very easy and quite a lot of folk do this)
With/Without Journaling
EBS Snapshots
General Snapshots
Replication (technically not back-up, however, a lot of folk use replica sets for their redundancy and back-up - not recommending this but it is done)
Until recently, there is no MongoDB equivalent of mylvmbackup but a nice guy wrote one :) In his words
Early days so far: it's just a glorified shell script and needs way more error checking. But already it works for me and I figured I'd share the joy. Bug reports, patches & suggestions welcome.
Get yourself a copy from here.
Restores
Formats etc
mongodump is completely documented here and mongorestore is here.
mongodump will not contain the indexes but does contain the system.indexes collection so mongorestore can rebuild the indexes when you restore the bson file. The bson file is the actual data whereas mongoexport/mongoimport are not type-safe so it could be anything (techically speaking) :)
Monitoring
Documented here.
I like Cacti but afaik, the Cacti templates have not kept up with the changes in MongoDB and so rely on old syntax so post 2.0.4, I believe there are issues.
Nagios works well but it's Nagios so you either love or hate it. A lot of folk use Nagios and it seems to provide them with great visiblity.
I've heard of some folk looking at Zappix but I've never used it so can't comment.
Additionally, you can use MMS, which is free and hosted externally. Your MongoDB instances run an agent and one of those agents communicate (using python code) over https to mms.10gen.com. We use MMS to view all performance statistics on the MongoDB instances and it is very beneficial from a high-level wide view as well as offering the ability to drill down. It's simple to install and you don't have to run any hardware for this. Many customers run it and some compliment it with Cacti/Nagios.
Help information on MMS can be found here (it's a very detailed, inclusive document).
One of the disadvantages of a mysql solution with stored json is that you will not be able to efficiently search on the json data. If you store it all in mongodb, you can create indexes and/or queries on all of your data including the json.
Mongo's writes work very well, and really the only thing you lose vs mysql is transaction support, and thus the ability to rollback multipart saves. However, if you are able to commit your changes in atomic operations, then there isn't a data safety issue. If you are replicated, mongo provides an "eventually consistent" promise such that the slaves will eventually mirror the master.
Mongodb doesn't provide native enforcement or cascading of certain db constructs such as foreign keys, so you have to manage those yourself (such as either through composition, which is one of mongo's strenghts), or through use of dbrefs.
If you really need transaction support and robust 'safe' writes, yet still desire the flexibility provided by nosql, you might consider a hybrid solution. This would allow you to use mysql as your main post store, and then use mongodb as your 'schemaless' store. Here is a link to a doc discussing hybrid mongo/rdbms solutions: http://www.10gen.com/events/hybrid-applications The article is from 10gen's site, but you can find other examples simply by doing a quick google search.
Update 5/28/2019
The here have been a number of changes to both MySQL and Mongodb since this answer was posted, so the pros/cons between them have become even blurrier. This update doesn't really help with the original question, but I am doing it to make sure any new readers have a bit more recent information.
MongoDB now supports transactions: https://docs.mongodb.com/manual/core/transactions/
MySql now supports indexing and searching json fields:
https://dev.mysql.com/doc/refman/5.7/en/json.html

MongoDB vs Mysql Storage space compare

I am building a data ware house that is the range of 15+ TBs. While storage is cheap, but due to limited budget we have to squeeze as much data as possible in to that space while maintaining performance and flexibility since the data format changes quiet frequently.
I tried Infobright(community edition) as a SQL solution and it works wonderful in term of storage and performance, but the limitation on data/table alteration is making it almost a no go. and infobright's pricing on enterprise version is quiet steep.
After checking out MongoDB, it seems promising except one thing. I was in a chat with a 10gen guy, and he stated that they don't really give much of a thought in term of storage space since they flatten out the data to achieve the performance and flexibility, and in their opinion storage is too cheap nowadays to be bother with.
So any experienced mongo user out there can comment on its storage space vs mysql (as it is the standard for what we comparing against to right now). if it's larger or smaller, can you give rough ratio? I know it's very situation dependent on what sort of data you put in SQL and how you define the fields, indexing and such... but I am just trying to get a general idea.
Thanks for the help in advance!
MongoDB is not optimized for small disk space - as you've said, "disk is cheap".
From what I've seen and read, it's pretty difficult to estimate the required disk space due to:
Padding of documents to allow in-place updates
Attribute names are stored in each collection, so you might save quite a bit by using abbreviations
No built in compression (at the moment)
...
IMHO the general approach is to build a prototype, insert data and see how much disk space your specific use case requires. The more realistic you can model your queries (inserts and updates) the better your result will be.
For more details see http://www.mongodb.org/display/DOCS/Excessive+Disk+Space as well.
Pros and Cons of MongoDB
For the most part, users seem to like MongoDB. Reviews on TrustRadius give the document-oriented database 8.3 out of 10 stars.
Some of the things that authenticated MongoDB users say they like about the database include its:
Scalability.
Readable queries.
NoSQL.
Change streams and graph queries.
A flexible schema for altering data elements.
Quick query times.
Schema-less data models.
Easy installation.
Users also have negative things to say about MongoDB. Some cons reported by authenticated users include:
User interface, which has a fairly steep learning curve.
Lack of joins, which can make some data retrieval projects difficult.
Occasional slowness in the cloud environment.
High memory consumption
Poorly structured documentation.
Lack of built-in analytics.
Pros and Cons of MySQL
MySQL gets a slightly higher rating (8.6 out of 10 stars) on TrustRadius than MongoDB. Despite the higher rating, authenticated users still mention plenty of pros and cons of choosing MySQL.
Some of the positive features that users mention frequently include MySQL’s:
Portability that lets it connect to secondary databases easily.
Ability to store relational data.
Fast speed.
Excellent reliability.
Exceptional data security standards.
User-friendly interface that helps beginners complete projects.
Easy configuration and management.
Quick processing.
Of course, even people who enjoy using MySQL find features that they don’t like. Some of their complaints include:
Reliance on SQL, which creates a steeper learning curve for users who
do not know the language.
Lack of support for full-text searches in InnoDB tables.
Occasional stability issues.
Dependence on add-on features.
Limitations on fine-tuning and common table expressions.
Difficulties with some complex data types.
MongoDB vs MySQL Performance
When comparing the performance of MongoDB and MySQL, you must consider how each database will affect your projects on a case-by-case basis. While some performance features may appear to be objectively promising, your team members may never use the features that drew you to a database in the first place.
MongoDB Performance
Many people claim that MongoDB outperforms MySQL because it allows them to create queries in multiple ways. To put it another way, MongoDB can be used without knowing SQL. While the flexibility improves MongoDB's performance for some organizations, SQL queries will suffice for others.
MongoDB is also praised for its ability to handle large amounts of unstructured data. Depending on the types of data you collect, this feature could be extremely useful.
MongoDB does not bind you to a single vendor, giving you the freedom to improve its performance. If a vendor fails to provide you with excellent customer service, look for another vendor.
MySQL Performance
MySQL performs extremely well for teams that want an open-source relational database that can store information in multiple tables. The performance that you get, however, depends on how well you configure the MySQL database. Configurations should differ depending on the intended use. An e-commerce site, for example, might need a different MySQL configuration than a team of research scientists.
No matter how you plan to use MySQL, the database’s performance gets a boost from full-text indexes, a high-speed transactional system, and memory caches that prevent you from losing crucial information or work.
If you don’t get the performance that you expect from MySQL data warehouses and databases, you can improve performance by integrating them with an excellent ETL tool that makes data storage and manipulation easier than ever.
MySQL vs MongoDB Speed
In most speed comparisons between MySQL and MongoDB, MongoDB is the clear winner. MongoDB is much faster than MySQL at accepting large amounts of unstructured data. When dealing with large projects, it's difficult to say how much faster MongoDB is than MySQL. The speed you get depends on a number of factors, including the bandwidth of your internet connection, the distance between your location and the database server, and how well you organise your data.
If all else is equal, MongoDB should be able to handle large data projects much faster than MySQL.
Choosing Between MySQL and MongoDB
Whether you choose MySQL or MongoDB probably depends on how you plan to use your database.
Choosing MySQL
For projects that require a strong relational database management system, such as storing data in a table format, MySQL is likely to be the better choice. MySQL is also a great choice for cases requiring data security and fault tolerance. MySQL is a good choice if you have high-quality data that you've been collecting for a long time.
Keep in mind that to use MySQL, your team members will need to know SQL. You'll need to provide training to get them up to speed if they don't already know the language.
Choosing MongoDB
When you want to use data clusters and search languages other than SQL, MongoDB may be a better option. Anyone who knows how to code in a modern language will be able to get started with MongoDB. MongoDB is also good at scaling quickly, allowing multiple teams to collaborate, and storing data in a variety of formats.
Because MongoDB does not use data tables to make browsing easy, some people may struggle to understand the information stored there. Users can grow accustomed to MongoDB's document-oriented storage system over time.

Using both Mongodb and Mysql in one project

I have been working to learn Mongodb effectively for one week in order to use for my project. In my project, I will store a huge geolocation data and I think Mongodb is the most appropriate to store this information. In addition, speed very important for me and Mongodb responds faster than Mysql.
However, I will use some joins for some parts of the project, and I'm not sure whether I store user's information in Mongodb or not. I heard some issues can occur in mongodb during writing process. should I use only mongodb with collections (instead of join) or both of them?
In most situations I would recommend choosing one db for a project, if the project is not huge. On really big projects (or enterprises in general), I think long term organizations will use a combination of
RDBMS for highly transactional OLTP
NoSQL
a datawarehousing/BI project
But for things of more reasonable scope, just pick the one that does the core of the use case, and use it for everything.
IMO storing user data in mongodb is fine -- you can do atomic operations on single BSON documents so operations like "allocate me this username atomically" are doable. With redo logs (--journal) (v1.8+), replication, slavedelayed replication, it is possible to have a pretty high degree of data safety -- as high as other db products on paper. The main argument against safety would be the product is new and old software is always safer.
If you need to do very complex ACID transactions -- such as accounting -- use an RDBMS.
Also if you need to do a lot of reporting, mysql may be better at the moment, especially if the data set fits on one server. The SQL GROUP BY statement is quite powerful.
You won't be JOINing between MongoDB and MySQL.
I'm not sure I agree with all of your statements. Relative speed is something that's best benchmarked with your use case.
What you really need to understand is what the relative strengths and weaknesses of the two databases are:
MySQL supports the relational model, sets, and ACID; MongoDB does not.
MongoDB is better suited for document-based problems that can afford to forego ACID and transactions.
Those should be the basis for your choice.
MongoDB has some nice features in to support geo-location work. It is not however necessarily faster out of the box than MySQL. There have been numerous benchmarks run that indicate that MySQL in many instances outperforms MongoDB (e.g. http://mysqlha.blogspot.com/2010/09/mysql-versus-mongodb-yet-another-silly.html).
Having said that, I've yet to have a problem with MongoDB losing information during writing. I would suggest that if you want to use MongoDB, you use if for the users as well, which will avoid having to do cross database 'associations', and then only migrate the users to MySQL away if it becomes necessary.

Cassandra or MySQL/PostgreSQL?

I have huge database (kinda wordnet) and want to know if it's easier to use Cassandra instead of MySQL|PostrgreSQL
All my life I was using MySQL and PostrgreSQL and I could easily think in terms of relational algebra, but several weeks ago I learned about Cassandra and that it's used in Facebook and Twitter.
Is it more convenient?
What DBMS are usually used nowadays to store social net's data, relationships between objects, wordnet?
There is nothing like a Silver bullet solution, everything is built to solve specific problem and has its own pros and cons. It is up to you to decide - what problem statement you have and what is best solution that fits your problem. Whether you use Cassandra (NoSQL) or MySQL(RDBMS), it is all driven from your system's requirements. Below are the inputs that will help you in taking better decision while deciding on database.
Why to Use NoSQL
In the case of RDBMS database, making choice is quite easy because almost all the databases like MySQL, Oracle, MS SQL, PostgreSQL in this category offer almost same kind of solutions oriented to the ACID property. When it comes to NoSQL, decision becomes difficult because every NoSQL database offers different solution and you have to understand which one is best suited for your app/system requirement. For example, MongoDB fits for use cases where your system demands schema-less document store. HBase might fit for Search engines, analysing log data, any place where scanning huge, two-dimensional join-less tables is a requirement. Redis is built to provide In-Memory search for varieties of data structures like tree, queue, link list etc and can be good fit for making real time leader board, pub-sub kind of system. Similarly there are other database in this category (including Cassandra) which fits for different problems. Now lets move to original question, and answer them one by one.
When to use Cassandra
Being a part of NoSQL family, Cassandra offers solution for problem where your requirement is to have very heavy write system and you want to have quite responsive reporting system on top of that stored data. Consider use case of Web analytics where log data is stored for each request and you want to built analytical platform around it to count hits by hour, by browser, by IP, etc in real time manner. You can refer to blog post (http://blogs.shephertz.com/2015/04/22/why-cassandra-excellent-choice-for-realtime-analytics-workload/) to understand more about the use cases where Cassandra fits in.
When to Use a RDMS instead of Cassandra/NoSQL
Cassandra is based on NoSQL database and does not provide ACID and relational data property. If you have strong requirement of ACID property (for example Financial data), Cassandra would not be a fit in that case. Obviously, you can make work out of it, however you will end up writing lots of application code to handle ACID property and will loose on time to market badly. Also managing that kind of system with Cassandra would be complex and tedious for you.
There are many different flavours of "NoSQL" databases. If your application is really like Wordnet perhaps you should look at a graph database such as Neo4j.
I would suggest to analyse your request.
If you are going with more clusters, machines take NoSQL
If your data model is complicated - require efficient structures take NoSQL (no limits with type of columns)
If you fit in a few machines without scales, and you don't need super performance for multi request (as for example in social network - where lot of users send http request), and you don't think you involve saleability take RDBMS (Postgres have some good functions and structures which you can use, like array column type).
Cassandra should work better with large scales of data, multi purpose.
neo4j - would be better for special structures, graphs.
Cassandra and other NoSQL stores are being used for social based sites because of their need for massive write based operations. Not that MySQL and Postgres can't achieve this but NoSQL requires far less time and money, generally speaking.
Sounds like you may want to look at Neo4J though, just in terms of your object model needs.
All different products and they all have their pro's and conn's. What kind of problem do you have to solve?
Huge, as in TB's?