SQL (MySQL) vs NoSQL (CouchDB) [closed] - mysql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am in the middle of designing a highly-scalable application which must store a lot of data. Just for example it will store lots about users and then things like a lot of their messages, comments etc. I have always used MySQL before but now I am minded to try something new like couchdb or similar which is not SQL.
Does anyone have any thoughts or guidance on this?

Here's a quote from a recent blog post from Dare Obasanjo.
SQL databases are like automatic
transmission and NoSQL databases are
like manual transmission. Once you
switch to NoSQL, you become
responsible for a lot of work that the
system takes care of automatically in
a relational database system. Similar
to what happens when you pick manual
over automatic transmission. Secondly,
NoSQL allows you to eke more
performance out of the system by
eliminating a lot of integrity checks
done by relational databases from the
database tier. Again, this is similar
to how you can get more performance
out of your car by driving a manual
transmission versus an automatic
transmission vehicle.
However the most notable similarity is
that just like most of us can’t really
take advantage of the benefits of a
manual transmission vehicle because
the majority of our driving is sitting
in traffic on the way to and from
work, there is a similar harsh reality
in that most sites aren’t at Google or
Facebook’s scale and thus have no need
for a Bigtable or Cassandra.
To which I can add only that switching from MySQL, where you have at least some experience, to CouchDB, where you have no experience, means you will have to deal with a whole new set of problems and learn different concepts and best practices. While by itself this is wonderful (I am playing at home with MongoDB and like it a lot), it will be a cost that you need to calculate when estimating the work for that project, and brings unknown risks while promising unknown benefits. It will be very hard to judge if you can do the project on time and with the quality you want/need to be successful, if it's based on a technology you don't know.
Now, if you have on the team an expert in the NoSQL field, then by all means take a good look at it. But without any expertise on the team, don't jump on NoSQL for a new commercial project.
Update: Just to throw some gasoline in the open fire you started, here are two interesting articles from people on the SQL camp. :-)
I Can't Wait for NoSQL to Die (original article is gone, here's a copy)
Fighting The NoSQL Mindset, Though This Isn't an anti-NoSQL Piece
Update: Well here is an interesting article about NoSQL
Making Sense of NoSQL

Seems like only real solutions today revolve around scaling out or sharding. All modern databases (NoSQLs as well as NewSQLs) support horizontal scaling right out of the box, at the database layer, without the need for the application to have sharding code or something.
Unfortunately enough, for the trusted good-old MySQL, sharding is not provided "out of the box". ScaleBase (disclaimer: I work there) is a maker of a complete scale-out solution an "automatic sharding machine" if you like. ScaleBae analyzes your data and SQL stream, splits the data across DB nodes, and aggregates in runtime – so you won’t have to!
And it's free download.
Don't get me wrong, NoSQLs are great, they're new, new is more choice and choice is always good!! But choosing NoSQL comes with a price, make sure you can pay it...
You can see here some more data about MySQL, NoSQL...: http://www.scalebase.com/extreme-scalability-with-mongodb-and-mysql-part-1-auto-sharding
Hope that helped.

One of the best options is to go for MongoDB(NOSql dB) that supports scalability.Stores large amounts of data nothing but bigdata in the form of documents unlike rows and tables in sql.This is fasters that follows sharding of the data.Uses replicasets to ensure data guarantee that maintains multiple servers having primary db server as the base. Language independent.
Flexible to use

Related

Planning for database scaling and schema changes [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm doing research before I create my social network database and I've found a lot of questions/resources pertaining to graph and key-value databases for social networks. I understand there are a TON of different options and ways to implement the DB. I also understand that what the big companies do is complex and way above what I currently need (1b+ users). I also know each of the big companies have revamped their databases to account for the insane scaling they go through.
Because I don't know how the network will grow, and I don't believe I can accurately create a model that will scale to 1m users (due to unknowns such as how people will use it, how often people post, comment, etc). But I can at least try to create a database that will be easiest to scale when (if) the need arises.
Do most companies create a database to handle up to 1k users, then once they grow, they revamp it for 10k users, then 100k, etc? If they do, at each of these arbitrary numbers (because of the unknowns listed above), do companies typically change a few tables/nodes/etc, or do they completely recreate the database to take advantage of new technologies (such as moving from SQL to graph)?
I want to pick the best solution, but I'm finding the decision between graph, key-value, SQL, among others very difficult--especially with no data to know what relationships/data is most important. I believe I can create a solid system using a graph that can support up to 10k users, but I'm worried having to potentially completely reacreate the database as the system grows. Is this a worry now to avoid issues, or implement now and adapt later type problem?
Going further, if I do need to plan on complete DB restructures, does it typically make sense to use a Multi-Model NoSQL DBMS (such as OrientDB or ArangoDB)?
I personally think you are asking premature questions.
Seriously, even with a bad model, a database can handle 10k users.
You think about scaling, but the hardest problem is not scaling, it is to come to the point where you need to scale.
I'm sure everybody wants 1bn users, but then you are already dreaming about having a social network with 200 times more users than Github itself ? (Github has ~ 5 million users).
Also, even by thinking it ahead, you will refactor and refactor again definitely during years, and you will have more than one persistence layer, be sure of it.
Code and code good, stay lean, remain able to change quickly, deploy, show to users, refactor, test, deploy and show to users in the same day. These are the things you need to do now, not asking questions about a problem you don't have yet, you definitely have a lot of other problems to solve now ;-)
UPDATE
Based on your comment, you might need to think that there are questions we just can not simply answer, because we don't need your exact requirements.
I have a simple app, which uses 4 persistence layers, and this app is not yet online. I'll give you my "why" about using it and which use case :
Neo4j : it is the core of the application data, I use it because I love it, I know it very much (it is my job) and, as the concept of the app is quite new and can evolve rapidly, having a schemaless db is reducing a lot of the refactoring stuff. Also I have now a lot of use cases coming by building the app, which make Neo4j a good choice when you need to add features without breaking what has already been done.
MySQL
I use it for User accounts and profiles. Why ? Because the framework I use already has a lot of bundles integrating this kind of stuff in a couple of lines of code, the bundles are well maintained and if I would use (currently) neo4j for it, I will have to reinvent the wheel. Also all the modules I use evolve in stability and compatibility with the framework.
Of course the mysql data is coupled (minimally) with the neo4j one. But I know that this kind of data will not evolve that much, so Mysql is a good choice and in case I have to refactor some points, this will not be a huge pain.
Redis
I use Redis for storing analytics data, Redis is quite flexible and I can easily create new keys and add data on top of it.
RabbitMQ :
I use a lot of message queues, why ? For testing refactoring. I can easily process messages with multiple consumers for testing "refactoring", testing mutliple database layers while the app is running for testing changes, testing new features, testing refactoring, ...
You will refactor ! Just try to keep it as simple as possible.

RDBMS vs NoSQL for CRM, CMS and other financial Systems [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I've read the whole SQL vs NoSql stuff out there in the Internet (spent a few days on it so I have rights to call it that way :) ) and still have a feeling I'm far away from being able to decide wich platform our products shall go with.
We're about to start designing a new set of products that mostly fit CRM/CMS categories, I'd say several B2B, B2C, B2E, E-Commerce as well as other financial and banking apps. So it's gonna be a complex system with dozens of databases solving different tasks. Let's concentrate on the DB area. I found this article is particularly interesting for DB systems in the world of enterprise. So the actual problem is:
Is it better to stay with good old RDBMS such as MySql (yes, it has to be open-source, that's the only requirement) or start off with NoSQL such as MongoDB/CouchDB (I guess Cassandra is too scalable for CRM, it's not going to be a very distributed and heavily clustered system. Up to 4 strong guys will do the job perfectly)???
As additional details I can say that a lot of media stuff and docs will be engaged in the system, this is a must for stores, markets, HR systems. And that the consumers of the storage will be web apps mainly.
Would it be better to split the DB back-end into two parts: RDBMS serving relational data and NoSQL for the media storage?
What you think and if you have examples or such an experience any help will just extremely help to avoid future problems. So Thank you guys in advance!
There are NoSQL (NewSQL) databases that are fully ACID compliant that you could consider. I would use one of those to handle the transactional CRM data. There are simply too many benefits using these compared to traditional relational databases:
Much better performance
Schemaless
Some let you remove the ORM completely and uses the created objects automatically
Some have integrated web server with REST/JSON support, that would be nice for you since you will work with web apps for the end user.
The ACID part is very important if you will build a CRM. I once build a CRM system that uses a NoSQL database and the performance made it possible to add features we never would have considered if we had used a traditional RMDBMS.
I like the idea that you should put the media and documents into a CDN and then refer to them from your database.
Your open source requirement could be a bit of a showstopper though.
I wrote an article on the subject that you might give some advice in the topic of selecting a database:
http://www.ulitzer.com/node/2636237

Should we be converting to PostgreSQL from MySQL? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Now that MySQL is in Oracle's hands, do you think it's a good idea to switch to using PostgreSQL for new applications instead? (Also what do you think about converting existing applications?)
I've used both DB systems before and while PostgreSQL is great for it's licensing terms and standards compliance, MySQL is definitely easier to get up and running quickly. (I make this as a personal observation, I know you might disagree...)
Edit:
I should clarify... I don't want this to be a MySQL/PostgreSQL is better than PostgreSQL/MySQL debate. I like both DB systems and am happy using both (and really for the complexity of most of the applications I'm working on, it's much of a muchness). I'm just in a position where I'm trying to look forward and consider the stability of my technology base before committing myself to a particular course. If you have gone through a similar process and have some kind of migration plan in mind I would like to hear from you regarding what that is and why you decided on it.
Installing is a one-time-job ... kindof. Depends ofcourse. but PostgreSQL isn't much harder to install than MySQL, if harder at all. It's the day-to-day cost of ownership that matters. As a developer I prefer PostgreSQL over MySQL, as the latter behaves different from version to version (they're still playing catchup to the sql standard and probably always will). Also MySQL is a pain to administer sometime. What does it matter if it takes ten minutes more to install if you must wait for hours when adding a column to a table or other trivial tasks. Finally I think the mysql-environment was too turbulent even before the Oracle takeover, with Oracle already owning innoDB, MariaDB. I think it is a general mess. So yes, I'd migrate, but for other reasons.
If you actually prefer MySQL over PostgreSQL I'd lay out a migration plan just to be ready if need arises, as a kind of lazy proactiveness ...
Look at it this way: regardless of what Oracle says, the fact remains that they could decide to do Something Bad with MySQL at any time. Maybe they will, and maybe they won't, but why take the risk (for new projects, at least) when you can just use PostgreSQL?
Given the choice, I'd just as soon go with Postgres myself. It seems to be a very stable project upon which to base my own work. Long history, under active development, good documentation, etc.
Since you've indicated that you're happy working with either one, I say go with Postgres for new projects and don't worry about converting existing projects unless and until Oracle does something with MySQL that gives you cause for concern.
I am no fan of Oracle, but the company has come forward with a 10 point commitment to existing MySQL customers.
So at least as of now, I don't see any cause for worry. Any database migration will require some effort and cost in terms of time and money. So if I were you, I'd hold on for a while before doing anything drastic as a database migration.
Even if MySQL does go south, there's MariaDB, which was started by the founder of MySQL. It's a drop in replacement and has some quite exciting new features.
http://askmonty.org/wiki/index.php/MariaDB
I've been giving a go on my development environment and I've been liking it so far.
See the article:
Save MySQL by letting Oracle keep it GPL
This answers your question amongst other things.
Good lord.
O.k. so let's just get it in the open. I am not a MySQL fan. I think its broken. However I am biased (http://www.commandprompt.com/). That said here are the benefits of PostgreSQL.
PostgreSQL scales farther than MySQL. MySQL does really well if you have a limited number of CPUs. If you get above 4, PostgreSQL will just go farther, longer.
PostgreSQL's license allows it to never be bought. You don't have to worry about a single entity taking it over. At present there are at least a dozen actively supporting companies including, Red Hat, PgExperts, Command Prompt, OmniTI, EnterpriseDB, Fujitsu and Oracle (yep).
PostgreSQL's feature set is remarkable. Just look at it.
However, and this is the most important. Do what your business requires. MySQL is a decent database when used for its purpose.

MySQL, MSSql, Oracle: When to use which? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
What's the limitation?
Is there a specific volume of data each can handle regardless of disk space?
When to use what assuming licensing is not a problem?
This is a very nuanced question that really cannot be easily answered, as each situation can provide many pluses and minuses. Also, MySQL being owned by Oracle now and several branches off of the main functionality means that MySQL != MySQL anymore.
If you are looking for really really big data sets, then you will like have to break with the RDBMS sets and start to look at things like MapReduce and other large data set processing technologies.
I have personally worked with all three over the past decade or so from the application perspective. They all have their advantages, like MSSQL working will with the other Microsoft technologies like LINQ where as MySQL having a large open community support and Oracle being the workhorse of the commercial sector with lots of ability to embed application logic right into the database.
Again, it really depends on the application, the situation, the skills of the people who will maintain it after it is developed, commercial considerations, hardware and platform considerations, etc etc etc.
It depends what you are trying to do and obviously it has to do with cost.
MySQL and Postgres are very widely used by a huge number of startups because its open source and there is a lot of support out there for people using it
MSSQL is good if you are using MS programming languages because of the ease to connect and use.
I have never used oracle but know people use it a lot for data warehouseing so can't have that much of a bad name
All of these will suffer from similar issues when scaling because they are RDBMS databases. They do also have decent ways to get round it and with a decent ORM used in your code then it shouldn't matter what you use.
Pick the one that all the developers are comfortable with
I'd say if you want to compare apples to apples, then it is MySQL vs SQL Express, vs Oracle Express.
Or if you have $, then it is the MySQL support license, MS-SQL Standard, vs whatever Oracle's cheapest offering is.
In my experience, once you choose a language, e.g. Php goes best with MySQL, then you've chosen your DB. Java goes well with Oracle. C# goes well with MSSQL.
Similarly, if you choose your OS, then unix flavors run MySQL or Oracle, but MSSQL is windows only. MySQL and Oracle work on both unix and windows of course.
If you need to buy many machines, then not having to pay OS licenses for the server helps in scaling.
As to skaffman's point you may want to have a look at postgres if mysql isn't scaling for you. It is a more mature and robust than mysql and is opensource. The time to make the switch is highly dependent on you application environment, however, if you need clustering and replication to work properly 100% of the time then postgres will not let you down (as mysql has for me in the past)
It would help narrow things a great deal if you'd provide details like whether or not you intend to distribute the database along with your software; your system will be hosted; how much data; etc.
Don't assume anything with regard to licensing. Get a lawyer, maybe even one who specializes in open source law.
"...regardless of disk space..." - capacity always depends on this. Where do you think the data goes? Better to think about things like sharding your data, RAID, clustering, replication, etc.
I would worry about any system whose developer had to come to a forum like this to ask that kind of question. You should have people on staff with sufficient skill and knowledge to have a strong opinion on this sort of thing.
Perhaps one variable which people overlook in these cases is the availability of expert support. Okay, so currently there's an oversupply of people who can help you with db issues, efficiency issues, disaster recovery etc. However this may not be always the case in the future, and it's the applications you use it for that may be the defining issue. Are there people in your organization who have experience in one or more of the relevant databases? (as it happens I believe that someone's who's become proficient in say Oracle, can become fairly competent in Sql*Server or Mysql in a fairly short space of time) You state it's going to be used for your financial systems - perhaps you really need input from a consultant who's worked on implementing and/or supporting financial systems - for example I understand that Sybase is popular in City type firms. Or perhaps there's an off-the-shelf package that utilises a preferred database? Try and define exactly what your system(s) needs to do first.
Is the application buy or build ?
If Buy, does it support all three and
talk to the app vendor about the
differences ?
If Build, then is it an in-house
build or contract out. If contracting
out, put out your requirements and
let the suppliers put their
arguments.
If in house build, then first look at
why you are not contracting out.
Normally it is because you already
have an in house capability, so look
at that expertise.
You want some sizing information first.
Are you talking data volumes in megabytes, gigabytes or terabytes ?
What are your uptime requirements, backup (recovery time / recovery point) ?
How much concurrent activity ? Is that peak ?
Generally any database system is fine for data storage and retrieval. High-end analysis, load balancing, replication, management, backup/recovery, auditability, security are all areas you may need to consider.

MySQL vs PostgreSQL for Web Applications [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am working on a web application using Python (Django) and would like to know whether MySQL or PostgreSQL would be more suitable when deploying for production.
In one podcast Joel said that he had some problems with MySQL and the data wasn't consistent.
I would like to know whether someone had any such problems. Also when it comes to performance which can be easily tweaked?
A note to future readers: The text below was last edited in August 2008. That's nearly 11 years ago as of this edit. Software can change rapidly from version to version, so before you go choosing a DBMS based on the advice below, do some research to see if it's still accurate.
Check for newer answers below.
Better?
MySQL is much more commonly provided by web hosts.
PostgreSQL is a much more mature product.
There's this discussion addressing your "better" question
Apparently, according to this web page, MySQL is fast when concurrent access levels are low, and when there are many more reads than writes. On the other hand, it exhibits low scalability with increasing loads and write/read ratios. PostgreSQL is relatively slow at low concurrency levels, but scales well with increasing load levels, while providing enough isolation between concurrent accesses to avoid slowdowns at high write/read ratios. It goes on to link to a number of performance comparisons, because these things are very... sensitive to conditions.
So if your decision factor is, "which is faster?" Then the answer is "it depends. If it really matters, test your application against both." And if you really, really care, you get in two DBAs (one who specializes in each database) and get them to tune the crap out of the databases, and then choose. It's astonishing how expensive good DBAs are; and they are worth every cent.
When it matters.
Which it probably doesn't, so just pick whichever database you like the sound of and go with it; better performance can be bought with more RAM and CPU, and more appropriate database design, and clever stored procedure tricks and so on - and all of that is cheaper and easier for random-website-X than agonizing over which to pick, MySQL or PostgreSQL, and specialist tuning from expensive DBAs.
Joel also said in that podcast that comment would come back to bite him because people would be saying that MySQL was a piece of crap - Joel couldn't get a count of rows back. The plural of anecdote is not data. He said:
MySQL is the only database I've ever programmed against in my career that has had data integrity problems, where you do queries and you get nonsense answers back, that are incorrect.
and he also said:
It's just an anecdote. And that's one of the things that frustrates me, actually, about blogging or just the Internet in general. [...] There's just a weird tendency to make anecdotes into truths and I actually as a blogger I'm starting to feel a little bit guilty about this
Just chiming in many months later.
The geographical capabilities of the two databases are very, very different. PostgreSQL has the exceptional PostGIS extension. MySQL's geographical functionality is practically zero in comparison.
If your web service has a location component, choose PostgreSQL.
I haven't used Django, but I have used both MySQL and PostgreSQL. If you'll be using your database only as a backend for Django, it doesn't matter much, because it will abstract away most of the differences. PostgreSQL is a little more scalable because it doesn't hit the brick wall as fast as MySQL as data-size/client-count increase.
The real difference comes in if you are doing a new system. Then I'd recommend PostgreSQL hands down, because it has a lot more features which make your DB layer much more customizable, so that you can fine-tune it to any requirements you might have.
Although it's a bit out of date, it would be worth reading the MySQL Gotchas page. Many of the items listed there are still true, to the best of my knowledge.
I use PostgreSQL.
I use both extensively. My choice for a particular project boils down to:
Licensing - Are you going to distribute your app (IANAL)
Existing Infrastructure and Knowledge Base
Any special sauce you have to have.
By special sauce I mean things like:
Easy/cheap replication = MySQL
Huge dataset problems with small results = PostgreSQL. Use the language extensions, and have very efficient data operations. (PL/Python, PL/TCL, PL/Perl, etc)
Interface with R Statistical Libraries = PostgreSQL PL/R available in debian/ubuntu
Well, I don't think you should be using a different database brand in anything past development (build, staging, prod) as that will come back to bite you.
From how I understand it PostgreSQL is a more 'correct' database implementation while mySQl is less correct (less compliant) but faster.
So if you are pretty much writing a CRUD application mySQL is the way to go. If you require certain features out of your database (if you're not sure then you don't) then you may want to look into postgreSQL.
If you are writing an application which may get distributed quite a bit on different servers, MySQL carries a lot of weight over PostgreSQL because of the portability. PostgreSQL is difficult to find on less than satisfactory web hosts, albeit there are a few. In most regards, PostgreSQL is slower than MySQL, especially when it comes to fine tuning in the end. All in all, I'd say to give PostgreSQL a shot for a short amount of time, that way you aren't completely avoiding it, and then make a judgement.
Thank you. I've used Django with MySQL and it's fine. Choose your database on the features you need. Hard to compare MySQL and Postgres. Better to compare Postgress to SQl Server.
#WolfmanDragon
PostgreSQL has (tiny) support for objects, but it is, by nature, a relational database. From its about page:
PostgreSQL is a powerful, open source relational database system.
MySQL is a relational database management system while PostgreSQL is an object-relational database management system. PostgreSQL is suited well for C++ or Java developers, as it gives us more control over how queries are written. ORDBMS also gives us Objects and User Defined Types. The SQL queries themselves are much closer to the ISO standards than MySQL.
Do you need an ORDBMS or a RDBMS? That will better answer your question.