When to use MongoDB [closed] - mysql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm writing an application that doesn't necessarily need scaling abilities as it won't be collecting large amounts data at the beginning. (However, if I'm lucky, I could down the road potentially.)
I will be running my web server and database on the same box (for now).
That being said, I am looking for performance and efficiency.
The main part of my application will be loading blog articles. Using an RDBMS (MySQL) I will make 6 queries (2 of the queries being joins), just to load a single blog article page.
select blog
select blog_album
select blog_tags
select blog_notes
select blog_comments (join with users)
select blog_author_participants (join with users)
However, with MongoDB I can de-normalize and flatten 6 tables into just 2 tables/collections and minimizes my queries to potentially just one 1 query,
users
blogs
->blog_album
->blog_tags
->blog_notes
->blog_comments
->blog_author_participants
Now, going with the MongoDB schema, there will be some data redundancy. However, hard drive space is cheaper than CPU/servers.
1.) Would this be a good scenario to use MongoDB?
2.) Do you only benefit in performance using MongoDB when scaling beyond a single server?
3.) Are there any durability risks using MongoDB? I hear that there is potential for loss of data while performing inserts - as insert are written to memory first, then to the database.
4.) Should this stop me from using MongoDB in production?

You would use MongoDB when you have a use case that matches its strengths.
Do you need a schema-less document store? Nope, you have a stable schema.
Do you need automatic sharding? Nope, you don't have extraordinary data needs or budget for horizontally scaling hardware.
Do you need map/reduce data processing? Not for something like a blog.
So why are you even considering it?

However, with MongoDB I can de-normalize and flatten 6 tables into just 2 tables/collections and minimizes my queries to potentially just one 1 query
But you can easily query MySQL for 6 tables worth of information related to a single blog post with a single properly crafted SQL statement.
however hard drive space is cheaper than CPU/servers.
If performance and scaling is a priority then you are going to be concerned with having enough RAM to fit everything into main memory and enough CPU cores to run queries. An enterprise grade RAID 10 array is a requirement, don't get me wrong, but as soon as your database software (MongoDB or MySQL) needs to scan an index that can't fit into main memory you'll be in for a world of pain assuming a large active database. :)
I like MongoDB, but it's big strength in my mind is map/reduce and its document-orientation. You require neither of those features. MySQL is time-tested in large scale deployments and supports partitioning (but I would argue that your database would have to be in the order of 50-100 GB before you can realize substantial gain from partitioning vs a single (plus passive backup) server with tons (64 GB+) of RAM. I would also argue that if performance is truly a concern then MySQL would be preferable as you would have supreme control over your indexes.
That's not to say that MongoDB isn't high performance, but its place probably isn't serving blogs. Your concern with inserts is valid as well. MongoDB is not an ACID system. Google transactions in both systems and compare.

Here is a good explanation: http://mod.erni.st/nosql-if-only-it-was-that-easy/
The last paragraph summarizes it:
What am I going to build my next app on? Probably Postgres. Will I use NoSQL? Maybe. I might also use Hadoop and Hive. I might keep everything in flat files. Maybe I’ll start hacking on Maglev. I’ll use whatever is best for the job. If I need reporting, I won’t be using any NoSQL. If I need caching, I’ll probably use Tokyo Tyrant. If I need ACIDity, I won’t use NoSQL. If I need a ton of counters, I’ll use Redis. If I need transactions, I’ll use Postgres. If I have a ton of a single type of documents, I’ll probably use Mongo. If I need to write 1 billion objects a day, I’d probably use Voldemort. If I need full text search, I’d probably use Solr. If I need full text search of volatile data, I’d probably use Sphinx.

NoSQL vs. RDBMS: Apples and Oranges?
I would advise you to read up a little on what NoSQL is and what it does before you decide whether you can use it. You can't take a normal database and turn it into a NoSQL thing just like that. The way you work with the data is completely different.
NoSQL definitely has its uses. But it's definitely not the answer for everything. The main advantage of NoSQL is the easily changeable data model.

Advantages of using mongodb ( as per Moshe Kaplan published in dzone article)
Schema-less design
Scalability in managing Tera bytes of data
Rapid replicaSet with high availability feature
Sharding enables linear and scale out growth w/o running out of budget
Support high write load
Use of Data locality for query processing
MongoDB meets Consistency & Partitioning requirements in CAP theory ( Consistency, Availability and Partitioning)
Related SE questions:
What are the advantages of using a schema-free database like MongoDB compared to a relational database?
When to Redis? When to MongoDB?

I can't speak to the performance considerations, but for me, the first consideration of whether you want to use a SQL-DB vs MongoDB is the structure of the data you want to store.
MongoDB is "schema-less" in the sense that you don't need to know what "tables" and "columns" you want beforehand. It is very flexible. So, if you don't know what information you want to store in your "blogs" Collection for example, or if different blog posts may store different information, then MongoDB allows this flexibility. Whereas with SQL relational databases, you have to know your schema upfront.
But it sounds like you already know what information you want to store, in which case I might just stick with a SQL relational database. I don't think performance is the first consideration in your case - you're not building a real-time application where one or two milliseconds matter all that much.

Related

Performance Analysis of CouchDB

I am developing a Discussion Forum for my University. For this to manipulate the data i m using CouchDB as database.
I m finding difficulty in designing the structure of my db, in order to maximize the performance of my db.
I want to discuss what is the good practice of designing a document database.
Either we should make only one database as SQL and make 'n' no. of documents in the database.
Or we can make more no of database in order to flatten my db structure.This also reduce the more no. of documents to be developed.
The questions you need to ask are simply this: "How do you want to get data out of your database?"
Database design hinges around the queries to be made, not what is available to be stored.
This is especially important for Document DBs like Couch, since, while it does have a flexible schema, it does not have flexible indexing. By that I mean that because of the granularity of the data, it's quite like that later on, when you need to ask a question that it was not designed to answer, answering that question may well be very expensive. It's much, much cheaper to design your views and other constructs early, when there is little data in the data base rather than later after you have thousands or millions of rows.
RDBMS's, since they tend to have a finer granularity of data, tend to be more nimble to new queries and such later in life. Document DBs, not so much.
So think through your use cases up front, and design around those, and design those early on, it's much less painless now than later.
It's hard to tell the right way to approach modeling your data since you don't give much information. Generally though you want to keep as much data as possible in one database as this allows you to index it together (indexes cannot span more than one database).
Also, since there is no schema enforcement in the database, you can create different types of records in each database. For example, there is nothing wrong with have both user information and forum entries in the same database.
Last, you will most likely want to keep messages and their replies in different records. This is an old but still relevant discussion on this topic: http://www.cmlenz.net/archives/2007/10/couchdb-joins
Cheers.

How to choose SQL Vs nosql storage [duplicate]

This question already has answers here:
What is NoSQL, how does it work, and what benefits does it provide? [closed]
(9 answers)
Closed 6 years ago.
I have a project where I'm expecting large amount of live traffic and location information. The project hasn't started yet. I'm still in the architectural design phase. So there's no fear of migration or backward compatibility problems.
I have previous knowledge of mysql, and relational data bases, but this would be my first encounter with nosql.
My question is: should I choose a sql or nosql storage? I know there are lots of opinions about this issue, and I've been doing some reading, but I'm still not sure based on which factors do I decide between them?
The question is how large your amount of traffic is going to be. NoSQL databases have the advantage that they scale very well because of their simpler data model (they can be easier distributed). But this also means that you have to give up a lot of stuff relational databases provide you. Those are first of all integrity mechanisms and a complex and convenient query language.
So i guess the first step is to make up your mind on your expected traffic and how much you need to scale. If a single database server will be able to handle the workload, you might want to go for a relational database.
The second aspect is the retrieval of your data. In relational databases you have SQL, which allows you to formulate very specific queries. On the other hand the relational model often forces you to distribute your data across multiple tables, even though they really belong together (Like an order + the ordered items). Thats one benefit of NoSQL databases like MongoDB where you would store things that belong together as a single document. Then the retrieval of this aggregate is also easy but if you want to do more complex queries you have to do it manually outside of the database.
So in the end you would use NoSQL mainly for simpler access patterns and if you want/need to scale. Relational DBMS have their advantage in the amount of functionality they give you. But for many tasks it is kind of unnatural to press your data into relations. Scaling is possible but more complicated.

AWS MySQL RDS vs AWS DynamoDB [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I've been using MySQL for a fair while now and I'm comfortable with its structure & SQL Queries etc.
Currently building a new system in AWS and I've been looking at DynamoDB. Currently I only know a little about it.
Is one better then the other?
What are the advantage of DynamoDB?
what is the transition like from MySQL queries etc to this flat style DB?
Really DynamoDB and MySQL are apples and oranges. DynamoDB is a NoSQL storage layer while MySQL is used for relational storage. You should pick what to use based on the actual needs of your application. In fact, some applications might be well served by using both.
If, for example, you are storing data that does not lend itself well to a relational schema (tree structures, schema-less JSON representations, etc.) that can be looked up against a single key or a key/range combination then DynamoDB (or some other NoSQL store) would likely be your best bet.
If you have a well-defined schema for your data that can fit well in a relational structure and you need the flexibility to query the data in a number of different ways (adding indexes as necessary of course), then RDS might be a better solution.
The main benefit for using DynamoDB as a NoSQL store is that you get guaranteed read/write throughput at whatever level you require without having to worry about managing a clustered data store. So if your application requires 1000 reads/writes per second, you can just provision your DynamoDB table for that level of throughput and not really have to worry about the underlying infrastructure.
RDS has much of the same benefit of not having to worry about the infrastructure itself, however if you end up needing to do a significant number of writes to the point where the largest instance size will no longer keep up, you are kind of left without options (you can scale horizontally for reads using read replicas).
Updated note: DynamoDb does now support global secondary indexing, so you do now have the capability to perform optimized lookups on data fields other than the hash or combination of hash and range keys.
We have just migrated all of our DynamoDB tables to RDS MySQL.
While using DynamoDB for specific tasks may make sense, building a new system on top of DynamoDB is really a bad idea. Best laid plans etc., you always need that extra flexibility from your DB.
Here are our reasons we moved from DynamoDB:
Indexing - Changing or adding keys on-the-fly is impossible without creating a new table.
Queries - Querying data is extremely limited. Especially if you want to query non-indexed data. Joins are of course impossible so you have to manage complex data relations on your code/cache layer.
Backup - Such a tedious backup procedure is a disappointing surprise compared to the slick backup of RDS
GUI - bad UX, limited search, no fun.
Speed - Response time is problematic compared to RDS. You find yourself building elaborate caching mechanism to compensate for it in places you would have settled for RDS's internal caching.
Data Integrity - While the concept of fluid data structure sounds nice to begin with, some of your data is better "set in stone". Strong typing is a blessing when a little bug tries to destroy your database. With DynamoDB anything is possible and indeed anything that can go wrong does.
We now use DynamoDB as a backup for some systems and I'm sure we'll use it in the future for specific, well defined tasks. It's not a bad DB, it's just not the DB to serve 100% of your core system.
As far as advantages go, I'd say Scalability and Durability. It scales incredibly and transparently and it's (sort of) always up. These are really great features, but they do not compensate in any way for the downside aspects.
You can read AWS explanation about it here.
In short, if you have mainly Lookup queries (and not Join queries), DynamoDB (and other NoSQL DB) is better. If you need to handle a lot of data, you will be limited when using MySQL (and other RDBMS).
You can't reuse your MySQL queries nor your data schema, but if you spend the effort to learn NoSQL, you will add an important tool to your tool box. There are many cases where DynamoDB is giving the simplest solution.
When using DynamoDB you should also know that the items/records in DynamoDB are limited to 400KB (See DynamoDB Limits). For many use cases this will not work. So DynamoDB will be good for few things but not all. Same goes for many of the other NoSQL database.

MongoDB vs Mysql Storage space compare

I am building a data ware house that is the range of 15+ TBs. While storage is cheap, but due to limited budget we have to squeeze as much data as possible in to that space while maintaining performance and flexibility since the data format changes quiet frequently.
I tried Infobright(community edition) as a SQL solution and it works wonderful in term of storage and performance, but the limitation on data/table alteration is making it almost a no go. and infobright's pricing on enterprise version is quiet steep.
After checking out MongoDB, it seems promising except one thing. I was in a chat with a 10gen guy, and he stated that they don't really give much of a thought in term of storage space since they flatten out the data to achieve the performance and flexibility, and in their opinion storage is too cheap nowadays to be bother with.
So any experienced mongo user out there can comment on its storage space vs mysql (as it is the standard for what we comparing against to right now). if it's larger or smaller, can you give rough ratio? I know it's very situation dependent on what sort of data you put in SQL and how you define the fields, indexing and such... but I am just trying to get a general idea.
Thanks for the help in advance!
MongoDB is not optimized for small disk space - as you've said, "disk is cheap".
From what I've seen and read, it's pretty difficult to estimate the required disk space due to:
Padding of documents to allow in-place updates
Attribute names are stored in each collection, so you might save quite a bit by using abbreviations
No built in compression (at the moment)
...
IMHO the general approach is to build a prototype, insert data and see how much disk space your specific use case requires. The more realistic you can model your queries (inserts and updates) the better your result will be.
For more details see http://www.mongodb.org/display/DOCS/Excessive+Disk+Space as well.
Pros and Cons of MongoDB
For the most part, users seem to like MongoDB. Reviews on TrustRadius give the document-oriented database 8.3 out of 10 stars.
Some of the things that authenticated MongoDB users say they like about the database include its:
Scalability.
Readable queries.
NoSQL.
Change streams and graph queries.
A flexible schema for altering data elements.
Quick query times.
Schema-less data models.
Easy installation.
Users also have negative things to say about MongoDB. Some cons reported by authenticated users include:
User interface, which has a fairly steep learning curve.
Lack of joins, which can make some data retrieval projects difficult.
Occasional slowness in the cloud environment.
High memory consumption
Poorly structured documentation.
Lack of built-in analytics.
Pros and Cons of MySQL
MySQL gets a slightly higher rating (8.6 out of 10 stars) on TrustRadius than MongoDB. Despite the higher rating, authenticated users still mention plenty of pros and cons of choosing MySQL.
Some of the positive features that users mention frequently include MySQL’s:
Portability that lets it connect to secondary databases easily.
Ability to store relational data.
Fast speed.
Excellent reliability.
Exceptional data security standards.
User-friendly interface that helps beginners complete projects.
Easy configuration and management.
Quick processing.
Of course, even people who enjoy using MySQL find features that they don’t like. Some of their complaints include:
Reliance on SQL, which creates a steeper learning curve for users who
do not know the language.
Lack of support for full-text searches in InnoDB tables.
Occasional stability issues.
Dependence on add-on features.
Limitations on fine-tuning and common table expressions.
Difficulties with some complex data types.
MongoDB vs MySQL Performance
When comparing the performance of MongoDB and MySQL, you must consider how each database will affect your projects on a case-by-case basis. While some performance features may appear to be objectively promising, your team members may never use the features that drew you to a database in the first place.
MongoDB Performance
Many people claim that MongoDB outperforms MySQL because it allows them to create queries in multiple ways. To put it another way, MongoDB can be used without knowing SQL. While the flexibility improves MongoDB's performance for some organizations, SQL queries will suffice for others.
MongoDB is also praised for its ability to handle large amounts of unstructured data. Depending on the types of data you collect, this feature could be extremely useful.
MongoDB does not bind you to a single vendor, giving you the freedom to improve its performance. If a vendor fails to provide you with excellent customer service, look for another vendor.
MySQL Performance
MySQL performs extremely well for teams that want an open-source relational database that can store information in multiple tables. The performance that you get, however, depends on how well you configure the MySQL database. Configurations should differ depending on the intended use. An e-commerce site, for example, might need a different MySQL configuration than a team of research scientists.
No matter how you plan to use MySQL, the database’s performance gets a boost from full-text indexes, a high-speed transactional system, and memory caches that prevent you from losing crucial information or work.
If you don’t get the performance that you expect from MySQL data warehouses and databases, you can improve performance by integrating them with an excellent ETL tool that makes data storage and manipulation easier than ever.
MySQL vs MongoDB Speed
In most speed comparisons between MySQL and MongoDB, MongoDB is the clear winner. MongoDB is much faster than MySQL at accepting large amounts of unstructured data. When dealing with large projects, it's difficult to say how much faster MongoDB is than MySQL. The speed you get depends on a number of factors, including the bandwidth of your internet connection, the distance between your location and the database server, and how well you organise your data.
If all else is equal, MongoDB should be able to handle large data projects much faster than MySQL.
Choosing Between MySQL and MongoDB
Whether you choose MySQL or MongoDB probably depends on how you plan to use your database.
Choosing MySQL
For projects that require a strong relational database management system, such as storing data in a table format, MySQL is likely to be the better choice. MySQL is also a great choice for cases requiring data security and fault tolerance. MySQL is a good choice if you have high-quality data that you've been collecting for a long time.
Keep in mind that to use MySQL, your team members will need to know SQL. You'll need to provide training to get them up to speed if they don't already know the language.
Choosing MongoDB
When you want to use data clusters and search languages other than SQL, MongoDB may be a better option. Anyone who knows how to code in a modern language will be able to get started with MongoDB. MongoDB is also good at scaling quickly, allowing multiple teams to collaborate, and storing data in a variety of formats.
Because MongoDB does not use data tables to make browsing easy, some people may struggle to understand the information stored there. Users can grow accustomed to MongoDB's document-oriented storage system over time.

Is there a high performance difference in a Key-Value db on a single server with MySQL vs. NoSQL

In my PHP application I have a 470M rows table weighing 200GB in a MySQL MyISAM partitioned table on one server. Usage includes 70% Writes/30% Reads.
I'm trying to improve performance. Main problem currently is read/write contentions due to table-level locks. I'm trying to decide between two options:
Changing MySQL to Innodb. Pros: avoiding the table level locks. Cons: Much more disk space, need bigger HDs which might not be as fast as these (currently using RAID10 6*300GB SAS 15k).
Moving data to a NoSQL db. Main Con: Learning curve. Have never used NoSQL before.
Question is, while trying to still avoid sharding the data, and considering the fact I'm using the RDMS MySQL as a simple key-value storage, are there high differences between performances between the two approaches or is the NoSQL main advantage here comes when moving to a distributed system?
I can only answer your question partially but hopefully more than a comment.
MongoDB is not typically a key-value store and has been known to have certain performance hits when used as one.
MongoDb also has a locking problem here that could come back to haunt you. It has a DB level lock atm which means it could (would need testing) cause write lock saturation.
It is also heavily designed for a 80% read app (which is said to be the most common setup for websites now-a-days) so the more writes you do the more you will notice a performance drop over time. That being said you can tweak MongoDB to be more write friendly and the distributed nature does help to stop write lock saturation a little.
However that being said my personal opinion the learning curve of MongoDB from SQL:
Was next to null
More natural and simpler to implement into my app than SQL
Query language is simple making it dead easy to get to grips with
Query language has a lot of similarities to SQL
The drivers are standardised so that the syntax you see in the Docs for the JS driver in the console is consistent across the board.
My personal opinion on the general matter is the distributed notion of it. If you get a NoSQL solution designed for key-value stores then it could be really good. A quick search on Google pulled out a small list of NoSQL key-value stores on Wikipedia: http://en.wikipedia.org/wiki/NoSQL#Key-value_stores_on_solid_state_or_rotating_disk