I'm looking at SQL & NoSQL Databases - namely MySQL and DynamoDB (both at AWS).
I'm building a dating/social network and demos I've built have been using a MySQL Database with around 50 tables for logical separation of data and then using SQL queries (often with joins) to extract required chucks of data to send back to browsers.
I'm moving to AWS and are doing a rebuilt of the system and wanted to know if it would be possible to write a site like this 100% in NoSQL. I understand you don't know the specifics of the site but it could be compared to any other dating/social network like facebook (obviously more involved) or Eharmony/Match Maker etc...
Could a Social Site be built 100% on NoSQL? or would a mix of NoSQL & SQL be move realistic?
thx
It's a very difficult question to answer without a deeper understanding of exactly what features you're after, and what language you are going to be writing the site in. There are lots of different types of NoSQL solutions.
NoSQL databases like Dynamo and Cassandra are Key-Value Stores. They offer a very different set of features than Document Databases like MongoDB and RavenDB. There are many other types as well.
Personally, I would be more than comfortable writing a social media site based entirely on RavenDB. But that's because I tend to focus on Domain Driven Design, and like to write in .Net/C#. It has all the features you would need, like querying indexes, map/reduce for big data jobs, full-text search, and spatial distance proximity searches. You could use their http/rest api if you wanted to program from php or javascript, but their C# client is much easier to use.
Your requirements may be different than mine would be though. I would encourage you to try out several different NoSQL technologies before you settle on one. You may still find that you need a SQL (or MySQL) database for certain things that your NoSQL solution doesn't handle. For example, RavenDB isn't recommended for ad-hoc reporting - so many people set up a separate SQL Server database and replicate data from Raven into SQL so they can provide a separate reporting database to their power users.
The biggest thing to remember is that most noSQL engines (like Cassandra) don't support querying, so that has to be a factor in your design (i.e. many things you take for granted in SQL like JOINs are much harder in a noSQL solution). With that being said, you most definitely can build full-featured applications using a noSQL solution. I encourage you to look into resources available from the many providers out there, like Cassandra, MongoDB, Dyanamo, and many others.
Related
Are there any benefits to using MongoDB for a Node.js application rather than a traditional SQL database such as MySQL, if I'm not planning to have large (>1000 item) collections and am already comfortable with SQL?
MongoDB is schema-less document based database. This means you can insert a JSON object with other nested objects. This can make development easier, especially for prototyping.
For a small project, why not? For a larger project you should do more research. Large or small, doesn't hurt to do the research anyway. You want to consider how your application uses the database (reads vs writes) and how MongoDB scales horizontally, and how it handles failures.
There's a thing called the CAP theorem that defines NoSQL databases. MongoDB is CP. This visual guide shows the relationships between different databases. What is most important to you and your application?
Something else to consider is that most NoSQL databases are not ACID compliant. If you're using MySQL with InnoDB, that can be something significant to give up, depending on your application. For example, transactions might be something you might not want to give up.
Lots of pros and cons. Best thing to ask yourself is: What am I gaining? What am I giving up? There are many things, and it really depends on your use-case.
There are lots of reasons to stick with a simple dbms for a small-scale application. One of them is the widespread availability of cheap hosting services providing MySQL. Another is ease of deployment and maintenance.
Of course, if you're trying to learn to use MongoDB, go for it!
I am building a data ware house that is the range of 15+ TBs. While storage is cheap, but due to limited budget we have to squeeze as much data as possible in to that space while maintaining performance and flexibility since the data format changes quiet frequently.
I tried Infobright(community edition) as a SQL solution and it works wonderful in term of storage and performance, but the limitation on data/table alteration is making it almost a no go. and infobright's pricing on enterprise version is quiet steep.
After checking out MongoDB, it seems promising except one thing. I was in a chat with a 10gen guy, and he stated that they don't really give much of a thought in term of storage space since they flatten out the data to achieve the performance and flexibility, and in their opinion storage is too cheap nowadays to be bother with.
So any experienced mongo user out there can comment on its storage space vs mysql (as it is the standard for what we comparing against to right now). if it's larger or smaller, can you give rough ratio? I know it's very situation dependent on what sort of data you put in SQL and how you define the fields, indexing and such... but I am just trying to get a general idea.
Thanks for the help in advance!
MongoDB is not optimized for small disk space - as you've said, "disk is cheap".
From what I've seen and read, it's pretty difficult to estimate the required disk space due to:
Padding of documents to allow in-place updates
Attribute names are stored in each collection, so you might save quite a bit by using abbreviations
No built in compression (at the moment)
...
IMHO the general approach is to build a prototype, insert data and see how much disk space your specific use case requires. The more realistic you can model your queries (inserts and updates) the better your result will be.
For more details see http://www.mongodb.org/display/DOCS/Excessive+Disk+Space as well.
Pros and Cons of MongoDB
For the most part, users seem to like MongoDB. Reviews on TrustRadius give the document-oriented database 8.3 out of 10 stars.
Some of the things that authenticated MongoDB users say they like about the database include its:
Scalability.
Readable queries.
NoSQL.
Change streams and graph queries.
A flexible schema for altering data elements.
Quick query times.
Schema-less data models.
Easy installation.
Users also have negative things to say about MongoDB. Some cons reported by authenticated users include:
User interface, which has a fairly steep learning curve.
Lack of joins, which can make some data retrieval projects difficult.
Occasional slowness in the cloud environment.
High memory consumption
Poorly structured documentation.
Lack of built-in analytics.
Pros and Cons of MySQL
MySQL gets a slightly higher rating (8.6 out of 10 stars) on TrustRadius than MongoDB. Despite the higher rating, authenticated users still mention plenty of pros and cons of choosing MySQL.
Some of the positive features that users mention frequently include MySQL’s:
Portability that lets it connect to secondary databases easily.
Ability to store relational data.
Fast speed.
Excellent reliability.
Exceptional data security standards.
User-friendly interface that helps beginners complete projects.
Easy configuration and management.
Quick processing.
Of course, even people who enjoy using MySQL find features that they don’t like. Some of their complaints include:
Reliance on SQL, which creates a steeper learning curve for users who
do not know the language.
Lack of support for full-text searches in InnoDB tables.
Occasional stability issues.
Dependence on add-on features.
Limitations on fine-tuning and common table expressions.
Difficulties with some complex data types.
MongoDB vs MySQL Performance
When comparing the performance of MongoDB and MySQL, you must consider how each database will affect your projects on a case-by-case basis. While some performance features may appear to be objectively promising, your team members may never use the features that drew you to a database in the first place.
MongoDB Performance
Many people claim that MongoDB outperforms MySQL because it allows them to create queries in multiple ways. To put it another way, MongoDB can be used without knowing SQL. While the flexibility improves MongoDB's performance for some organizations, SQL queries will suffice for others.
MongoDB is also praised for its ability to handle large amounts of unstructured data. Depending on the types of data you collect, this feature could be extremely useful.
MongoDB does not bind you to a single vendor, giving you the freedom to improve its performance. If a vendor fails to provide you with excellent customer service, look for another vendor.
MySQL Performance
MySQL performs extremely well for teams that want an open-source relational database that can store information in multiple tables. The performance that you get, however, depends on how well you configure the MySQL database. Configurations should differ depending on the intended use. An e-commerce site, for example, might need a different MySQL configuration than a team of research scientists.
No matter how you plan to use MySQL, the database’s performance gets a boost from full-text indexes, a high-speed transactional system, and memory caches that prevent you from losing crucial information or work.
If you don’t get the performance that you expect from MySQL data warehouses and databases, you can improve performance by integrating them with an excellent ETL tool that makes data storage and manipulation easier than ever.
MySQL vs MongoDB Speed
In most speed comparisons between MySQL and MongoDB, MongoDB is the clear winner. MongoDB is much faster than MySQL at accepting large amounts of unstructured data. When dealing with large projects, it's difficult to say how much faster MongoDB is than MySQL. The speed you get depends on a number of factors, including the bandwidth of your internet connection, the distance between your location and the database server, and how well you organise your data.
If all else is equal, MongoDB should be able to handle large data projects much faster than MySQL.
Choosing Between MySQL and MongoDB
Whether you choose MySQL or MongoDB probably depends on how you plan to use your database.
Choosing MySQL
For projects that require a strong relational database management system, such as storing data in a table format, MySQL is likely to be the better choice. MySQL is also a great choice for cases requiring data security and fault tolerance. MySQL is a good choice if you have high-quality data that you've been collecting for a long time.
Keep in mind that to use MySQL, your team members will need to know SQL. You'll need to provide training to get them up to speed if they don't already know the language.
Choosing MongoDB
When you want to use data clusters and search languages other than SQL, MongoDB may be a better option. Anyone who knows how to code in a modern language will be able to get started with MongoDB. MongoDB is also good at scaling quickly, allowing multiple teams to collaborate, and storing data in a variety of formats.
Because MongoDB does not use data tables to make browsing easy, some people may struggle to understand the information stored there. Users can grow accustomed to MongoDB's document-oriented storage system over time.
I am building a social network (connections and their connections, messages and locations) and I am a little confused in deciding whether to go with a relational database (MySQL) or a no-sql system (MongoDB) when designing our backend APIs. Does anyone have any views on what to use when?
PS: I am building developer APIs for developers to tap into our system with oAuth. So scalability and performance is also key factor. Rails 3 + Devise (most likely).
This depends largely on which technology you are comfortable with, what exactly do you want to get out of this etc. etc.
Coming back to your question, not all data is relational. So For those situations, NoSQL can be helpful. With that said, NoSQL stands for "Not Only SQL". It's not intended to knock MySQL or supplant it.
SQL or MySQL has several very big advantages:
MySQL is Strong mathematical basis.
Declarative syntax.
A well-known language in Structured Query Language (SQL).
Highly proven and extremely reliable technology. MySQL has been around far more than the oldest noSQL. It's a mature piece of technology. Google Adsense runs on MySQL, Facebook persistent store is MySQL. The examples suggest its reliability.
As a result of being mature technology, people have optimised the shit out of it.
Enormous online and open source community both for support and providing features as opposed to noSQL technologies (look what happened to Cassandra)
In my opinion, all the above questions matter to me when I choose a piece of technology. Hey well, if it's a Sunday evening project that you want to whip up with little real world consequences then do what whims you but if it's slightly more serious then please consider these questions.
SQL hasn't gone away (even in noSQL). It's a mistake to think about this as an either/or argument. NoSQL is an alternative that people need to consider when it fits, that's all.
Documents can be stored in non-relational databases, like CouchDB or even in MySQL (it borders on abuse but still). A Relational database in principle could make a very good NOSQL solution
Check out this hilarious video. This gives a different perspective on this topic :)
I chose MongoDB for my "Social" application because of the flexibility of the schema and scalability/performance. MongoDB has allowed me to adjust my schema without having to make drastic code changes and makes reading/finding data very easy.
I also chose MongoDB as a learning experience. I wanted to know what all the fuss was about with these "noSQL" databases...and now I know why. MongoDB is awesome in my opinion and definitely worth looking at for a Social network that requires scalability and performance. Node.js would also be an excellent choice for the API ;)
neither.
Go with a network/graph database, you will not regret. My current favorite is Neo4j.
http://neo4j.org/
note: Not related to Neo4J
I think the latest version of Neo4J has a sql interface, just in case you would need SQL compatibility. Otherwise, do your crud using their native library. It is very fast.
If you would need to visualize the Graph data, which would be very impressive to show to your boss, use yEd package. To export neo4J to a graphml format, use this:
Convert Neo4j DB to XML?
You could front end your relationships in Neo4J and backend it with a relational db or mongodb. I have seen those hybrid architectures as well.
If you project requires actual relationships between certain objects then MySql will be fine. If you are storing things that typically just have inherent data to them, such as a user with messages to other users, then a document style database, such as MongoDB, makes more sense.
You can do relationships in mongo, but they make a lot more sense in a relational database. But, if most of your data is more inherent of a user then mongo makes more sense.
In your case a document type of scheme makes more sense where each user has a list of connected users and their own personal atributes, ect...
I have huge database (kinda wordnet) and want to know if it's easier to use Cassandra instead of MySQL|PostrgreSQL
All my life I was using MySQL and PostrgreSQL and I could easily think in terms of relational algebra, but several weeks ago I learned about Cassandra and that it's used in Facebook and Twitter.
Is it more convenient?
What DBMS are usually used nowadays to store social net's data, relationships between objects, wordnet?
There is nothing like a Silver bullet solution, everything is built to solve specific problem and has its own pros and cons. It is up to you to decide - what problem statement you have and what is best solution that fits your problem. Whether you use Cassandra (NoSQL) or MySQL(RDBMS), it is all driven from your system's requirements. Below are the inputs that will help you in taking better decision while deciding on database.
Why to Use NoSQL
In the case of RDBMS database, making choice is quite easy because almost all the databases like MySQL, Oracle, MS SQL, PostgreSQL in this category offer almost same kind of solutions oriented to the ACID property. When it comes to NoSQL, decision becomes difficult because every NoSQL database offers different solution and you have to understand which one is best suited for your app/system requirement. For example, MongoDB fits for use cases where your system demands schema-less document store. HBase might fit for Search engines, analysing log data, any place where scanning huge, two-dimensional join-less tables is a requirement. Redis is built to provide In-Memory search for varieties of data structures like tree, queue, link list etc and can be good fit for making real time leader board, pub-sub kind of system. Similarly there are other database in this category (including Cassandra) which fits for different problems. Now lets move to original question, and answer them one by one.
When to use Cassandra
Being a part of NoSQL family, Cassandra offers solution for problem where your requirement is to have very heavy write system and you want to have quite responsive reporting system on top of that stored data. Consider use case of Web analytics where log data is stored for each request and you want to built analytical platform around it to count hits by hour, by browser, by IP, etc in real time manner. You can refer to blog post (http://blogs.shephertz.com/2015/04/22/why-cassandra-excellent-choice-for-realtime-analytics-workload/) to understand more about the use cases where Cassandra fits in.
When to Use a RDMS instead of Cassandra/NoSQL
Cassandra is based on NoSQL database and does not provide ACID and relational data property. If you have strong requirement of ACID property (for example Financial data), Cassandra would not be a fit in that case. Obviously, you can make work out of it, however you will end up writing lots of application code to handle ACID property and will loose on time to market badly. Also managing that kind of system with Cassandra would be complex and tedious for you.
There are many different flavours of "NoSQL" databases. If your application is really like Wordnet perhaps you should look at a graph database such as Neo4j.
I would suggest to analyse your request.
If you are going with more clusters, machines take NoSQL
If your data model is complicated - require efficient structures take NoSQL (no limits with type of columns)
If you fit in a few machines without scales, and you don't need super performance for multi request (as for example in social network - where lot of users send http request), and you don't think you involve saleability take RDBMS (Postgres have some good functions and structures which you can use, like array column type).
Cassandra should work better with large scales of data, multi purpose.
neo4j - would be better for special structures, graphs.
Cassandra and other NoSQL stores are being used for social based sites because of their need for massive write based operations. Not that MySQL and Postgres can't achieve this but NoSQL requires far less time and money, generally speaking.
Sounds like you may want to look at Neo4J though, just in terms of your object model needs.
All different products and they all have their pro's and conn's. What kind of problem do you have to solve?
Huge, as in TB's?
I am developing a Rails application that will access a lot of RSS feeds or crawl sites for data (mostly news). It will be something like Google News but with a different approach, so I'll store a lot of news (or news summaries), classify them in different categories and use ranking and recommendation techniques.
Should I go with MySQL?
Is it worthwhile using IBM DB2
purexml to store the doucuments?
Also Ruby search implementations
(Ferret, Ultrasphinx and others) are
not needed If I choose DB2. Is that correct?
What are the advantages of
PostreSQL in this?
Does it makes sense to use Couch DB in
this scenario?
I'd like to choose the best option but without over-complicating the solution. So I discarded the idea to use two different storage solutions (one for the news documents and other for the rest of the data). I'm also considering only "free" options, so I didn't look at Oracle or MS SQL Server.
purexml is heavier than SQL, so you pay more for your roundtrip between webserver and DB. If you plan to have lots of users, I'd avoid it, your better off letting your webserver cache the requests, thus avoiding creating xml(rss) everytime, if that is what you are thinking about.
I'd go with MySQL because its really good at serving and its totally free, well PostgreSQL is too, but haven't used it so I can't say.
CouchDB could make sense, but not if you plan on doing OLAP (Offline Analysis) of your data, a normal RDBMS will be better at it.
Admitting firstly that I generally don't like mysql, I will say that there has been writing on this topic regarding postgres:
http://oldmoe.blogspot.com/2008/08/101-reasons-why-postgresql-is-better.html
This is always my choice when I need a pure relational database. I don't know whether a document database would be more appropriate for your application without knowing more about it. It does sound like it's something you should at least investigate.
MySQL is probably one of the best options out there; light, easy to install and maintain, multiplatform and free. On top of that there are some good free client tools.
Something to think about; because of the nature of your system you will probably have some tables that will grow quite a lot very quickly so you might want to think about performance.
Thus, MySQL supports vertical partitioning but only from V 5.1.
It sounds to me the application you will build can easily become a large-scale web app. I would suggest PostgreSQL, for it has been known for its reliability.
You can check out the following link -- Bob Ippolito from MochiMedia tells us why they ditched MySQL for PostgreSQL. Although the posts are more than 3 years old, the issues MySQL 5.1 has recently tend to prove that they are still relevant.
http://bob.pythonmac.org/archives/category/sql/mysql/
MySQL is good in production. I haven't used PostgreSQL for rails, but it's a good solution as well.
In the dev and test environments I'd start out with SQLite (default), and perhaps migrate to your target DB in the test environment as you move closer to completion.