Disadvantages of MySQL versus other databases - mysql

Every single book that teaches programming (or almost anything else) starts off with a whole bunch of spiel on why what it's about (C++, MySQL, waterskiing, skydiving, dentistry, whatever) is the greatest thing in the world. So I open the MySQL O'Reilly book, and read the intro, and get the traditional sermon. The main points that the book mentioned were:
MySQL has been shown to have tied Oracle as the fastest and most scalable database software.
It's free and open source.
Sounds pretty convincing, but I know there's always at least two sides of every story. I knew I needed to be disillusioned when I saw someone suggest to someone to use Oracle instead of MySQL and thought, "Why in the world would you want to do that?!", just because of the few paragraphs I'd read, with no other justification. So lets investigate the other side of the story:
What are some reasons NOT to use MySQL?

Here's just a random list of stuff that popped into my head. It's CW, so feel free to add to it as necessary.
Oracle provides a top notch ERP built on their database. If your company is subject to Sarbanes-Oxley regulations, this is quite a bit above "crucial."
SQL Server licenses come with Analysis Services, Integration Services, and Reporting Services. If you want to do anything with OLAP, ETL, or reporting, these three are great applications that are built on the SQL Server stack.
SQL Server has native .NET data types (in 2008). Absolutely brilliant for .NET shops dealing with geospatial datasets.
MySQL does not support check constraints.
SQL Server includes the over clause, which helps when dealing with the "top n rows in each group" problem. Essentially, you can do aggregate functions partitioned over the dataset any way you'd like.
SQL Server uses Kerberos and Windows authentication natively. MySQL does not tie into Active Directory.
Superior performance on subqueries (almost any database has subquery performance that is superior to MySQL's)
Oracle, SQL Server, PostgreSQL and others have a richer set of join algorithms available to them; this means joins can often be performed faster, especially when large tables are involved.

MySQL has been shown to have tied oracle as the fastest and most scalable database software.
Making that statement about any two database systems is probably enough to throw the book away without reading the rest. Database systems are not commodities that can be compared with a couple lines of information, and will not be for the foreseeable future.
One reason that the statement is obviously false is that MySQL has very limited plan choices available. For instance, MySQL can't use merge join or hash join -- two fundamental algorithms that have useful performance characteristics. That's pretty much the end of the story for many query workloads. It is trivial to show a reasonable query that is orders of magnitude faster with a merge join.
There are plenty of other criticisms of MySQL versus XYZ and vice-versa. My point is that this is a complex issue, and the book is drastically oversimplifying. If you're getting involved in databases at all, you need to spend time diversifying your knowledge and understanding fundamentals.
My personal opinion is that MySQL and SQLite are the worst places to start. Pick something like Oracle (which can be downloaded free of charge for learning/evaluation, which many don't realize), PostgreSQL (BSD license), or MS SQL. FirebirdSQL might be good, too. Once you familiarize yourself with a few systems, you'll be able to make an informed choice about whether the trade-offs MySQL makes are right for you.

Everyone seems to be missing one of the main reasons to stick with Oracle/MS. You've already got a stable full of DBAs that know those products inside and out.

The default collation in mysql is case-insensitive. This is not a problem per se, but I think this strange default is an indication that it was targeted at hobby-developers, rather than professionals. This is a big assumption, but I'd think any professional would expect a database to compare strings for identity by default (i.e. using a binary collation).
Manipulation of tables during transactions causes implicit COMMITs. While this might not look grieve at the first glance, you will notice that you cannot cannot work under ACID conditions if altering/creating tables is an inherent part of your application.

MySQL can certainly match or beat Oracle in speed. I've done it numerous times myself. Ok, so I had to use various table types like black hole, merge, innodb, and myisam in just the right laces. And it took me a few days to get everything working just right. The Oracle DBA got things working in an hour or two.
MySQL is fine for 98% of the sites out there, maybe more. But it is fairly easy to bring it to a crawl without a lot of data if you don't know what you are doing. Oracle is quite a bit harder to bring to a crawl, but it can still be done. I've worked with both with datasets in the hundreds of millions of records (tiny by some measures). MySQL takes quite a bit more attention.
No database can scale indefinitely, which is why nosql "databases" are becoming so popular. I think the real question is if MySQL is "good enough" for what you need to do. The price is certainly right. The same could be said about PHP.
Why does Facebook use MySQL? Could you imagine what it would cost them to buy enough Oracle licenses!? It's good enough.

The future is of sun (the company behind mysql) is unclear and you don't know whether there will be a company to back the product.

MySQL is very tolerant of ambiguities -- something you don't want in a database system. Here are a few examples off the top of my head:
As another poster stated, CHAR and VARCHAR columns are case-insensitive, already a pretty bad sign.
You can INSERT into a table that has a column without a default value that is also NOT NULL. Yes, really! Instead of throwing an error, MySQL will pick a value for you based on the data type, e.g. 0 for numbers.
You can use a GROUP BY statement while some columns are neither using an aggregate function, nor included in the GROUP BY statement. The outcome is pretty much random. No warnings or errors here either, in my experience.
MySQL is also far from rock-solid. Just this month, I discovered a bug in the (admittedly old, but a "stable release") version of MySQL used by DreamHost that results in data loss. (Certain conditions when creating a table with variable-length rows.)
I've been using MySQL for many years and still do, but would never dream of using it for anything serious, where data loss would be a big problem. It's great for non-mission-critical web sites and blogs though.

I knew I needed to be disillusioned
when I saw someone suggest to someone
to use oracle instead of MySQL and
thought, "Why in the world would you
want to do that?!"
Because your company has been using Oracle for the past ten years, or because you equate enterprise usage with 'must be good' and open-source with 'free crap'. That's just about the only reason. Everyone I know who has worked with Oracle loathes it. Everyone I know who has worked with MySQL, assuming they don't love it, at least consider it a better alternative to Oracle in almost every regard.
SQL RMDBs are so complex though, that in almost every respect there's something one DB does that another doesn't. It is also, unfortunately, a fact of comparing databases that people quote statistics without using properly configured servers. If you have two default configurations for a server, one might be better than the other, but that's about as far as the comparisons usually go. They don't reflect the fact that these gigantic applications have a million little switches and toggles you can use to speed certain things up, increase reliability and generally screw up bad science.

MySQL tends to be a very general purpose database system, you can use it for almost anything that you'd use Oracle, SQL Server, PostgreSQL, DB2, etc for.
However, these different systems have different strengths, PostgreSQL has a ton more functionality than MySQL and can handle some very specific tasks that MySQL struggles with. SQL Server usually integrates with Microsoft products very easily whereas MySQL you'd have to do some extra work to make them play together. Oracle is MASSIVE, they're not just databases and when you're dealing with large, expansive systems Oracle probably has the gear to cover everything under the 1 roof, whereas you'd need to tie a bunch of disparate systems together to have MySQL has your database system.
Whether or not to use MySQL should be based upon whether or not it is reasonable to use MySQL.

Disclaimer: I have been using MySQL since 2001 and still love it, but here are a few reasons that make me doubt about my fidelity...
There are some false arguments (it was true a few years ago) in some of the answers I read. Before making a choice, check MySQL documentation and its up-to-date list of features. You could be surprised.
Each DB server lack functionalities. This is not a real blocking issue if you do not specifically need them.
For me, the main issues are elsewhere:
The time needed to have a bug fixed and published in a stable release. It is a shame. (For some bugs... it takes years (no kidding)!)
The frequency of stable releases.
But since this year, the new issues are:
The number of increasing branches (Percona, Google, Facebook, etc.).
Sun is unclear with his strategy.
Many MySQL employees left the company.

It's free and open source.
True. But keep in mind that MySQL is, in many cases, not free for commercial use. MySQL and the connectors (the official drivers for various languages), are GPL licensed.
If you use, say, the Connector/.NET to connect to MySQL your code have to be GPL compatible. It's dual licensed though, so you can buy an enterprise version under another license - and I believe they have a (either free or just very cheap) program that lets you license the connectors under a different license.
Everyone I know using MySQL is unaware of this :-)

Basically, there are several choices for a database. Frankly, in today's world, DB choice is less important than it was a few years ago. Here are a few issues to consider.
Most of the current database systems in widespread use such as SQL Server (and SQL Server Express), Oracle, MySQL, SQLLite, etc. are relatively standards compliant and can be used somewhat interchangeably. Some serve different niche markets. For example, SQL Server, MySQL, and Oracle are all good choices for large Enterprise applications. SQLLite is very good for applications which deploy on a client and need a local database with a small footprint and minimal configuration. (In my opinion, Oracle is extremely over-priced, is backed by an arrogant unresponsive company. It would never be my first choice on any project. I would only use it if it was mandated by the client or by necessity.)
A high percentage of top-end developers are using tools such as Hibernate(Java)/NHibernate(.NET) to build their data access layers. Hibernate variants strongly encourage developers to start with development of the object model rather than the database model. The Hibernate application then generates the data model automatically--and even handles data model updates. Hibernate variants can be used with any of the major database vendors. Changing your database choice can be as simple and painless as selecting a different database type in your configuration. On a side note, I should mention that while Hibernate and NHibernate are cross-database-compatible, they do not work on the lowest common denominator. The data access code in these applications is often designed to take advantages of special features within a given database engine. For example NHibernate supports access to the NVarchar(Max) data type in SQL Server which allows for very long strings.
In most applications, issues with database performance do not derive directly from the speed of reads and writes. Most of the issues relate to how the application manages the caching of frequently accessed data. For example, in online blog site, it makes sense to cache blog posts once they have been read so they are not repeatedly fetched from the database. This caching mechanism is almost always primarily handled by the application code rather than database server--though database servers do provide some caching. Hibernate/NHibernate have excellent caching support built in as does Microsoft's ASP.NET and their new MVC framework built on top of ASP.NET.
Enterpise databases (SQL Server, Oracle, MySQL) are best for situations where functionality such as replication, clustering, huge datasets, etc. are required.

I don't like MySQL licence : Firebird and PostgreSQL are better
There is no real hotbackup include in the MySQL by Sun
you can also look here which is interresting link and comment !

MySQL is free, but it takes an expert to maintain. Someone who naturally uses the command prompt and is not afraid to experiment. In some cases, MySQL problems are too complex, and the right people to troubleshoot them may not be available for any amount of money.
SQL Server is priced in the middle range. It can be maintained by "normal people", the kind who go home every day on 17:00 and have a natural disinclination to fifty page HOW-TO's. SQL Sever performs well in most instances but can break down in specific scenarios.
Oracle is the most expensive and requires highly paid operators. If you have the money, Oracle is a "safe" choice, because there's nothing Oracle won't do for money.
Three products, three markets!

A couple of pages listing gotchas (such as this and this) make me want to stay as far away from MySQL as possible. Here's a more neutral comparison of Postgres and MySQL.
As for the open source aspect others mentioned: MySQL is open source and free, only if your application is, too. If it's not, you need a commercial license.

My personal story:
Adding a new index to a table of about 10k rows.
MySQL side
about 30 seconds.
Postgres side
about 1 second.

I've worked with MySQL for years, and SQL Server only over the past year. I don't really see one being any easier or harder to use than the other in most cases. I do wish, however, that MSSQL had some of the features that MySQL possesses (e.g. being able to insert multiple rows on a single INSERT statement).

Also, if you don't have to use RDBMS, checkout redis. It is basically memchached with persistence with asynchronous write through. The performance is not on the same scale with MySQL.
Well... I guess the comparison isn't really fair to MySQL since it's not RDBMS...

Related

Database Management System for db-heavy, busy website/application?

EDIT: I've learnt, and it's probably true that YouTube uses MySQL. But it probably would be the enterprise edition and not free edition. The only alternative seems to be PostgreSQL. Long question short - - Can PostgreSQL used instead of MySQL? Is it a very good alternative in any case?
Firstly, I noticed that these are the most common names when it comes to (relational) database management systems - - DB2 (IBM), Oracle Database, Microsoft SQL, Ingres, MySQL, PostgreSQL and FireBird. So, should I presume these are the best?
Okay, of the above - - DB2 (IBM), Oracle Database and Microsoft SQL, the so-called Enterprise DBMSs, come with a bill; while MySQL (exclude enterprise version), PostgreSQL and FireBird are open source and free.
As should be clear from my previous questions here, I plan to build a photo-sharing site (something like Flickr, Picasa), and like any other, it's going to be database-heavy and (hopefully) busy.
Here's what I would love to know: (1) does any one of the free DBMSs stand up to the mark with the paid enterprise DBMSs? (2) Can any of the free DBMSs scale and perform well for enormous and busy databases without too much headbanging and facepalming?
Things in my mind w.r.t the DB:
Mature
Fast
Perform great/fine under heavy load
Perform great/fine as database grows
Scalable (smooth transition)
support for languages (preferably Python, PHP, JS, C++)
Feature-rich
etc (whatever I am missing)
PLZ NOTE: I know Facebook, Twitter etc use (or at least used) MySQL, and I see reports from time to time, how their sysadmins cry over that decision. So, please don't say, XXX uses it, so why can't you. They've started small, I am too. They've made mistakes, I don't want to. I want to keep the scaling-transition smooth. I hope I am not asking too much. Thanks.
"Which is the best database" is a huge question and is the subject of much contention. I've noticed on StackOverflow there is a tendency to close such questions; although the question is interesting, it is also quite unresolvable ;-)
FWIW, I would go with this:
Use what you know
If it doesn't conflict too heavily with the first rule, use something that is free of charge
Use what works with other parts of your stack
Use what you can hire for at reasonable cost (so, maybe not Oracle unless you really have to)
Don't optimise too early. Working slowly is much better than an unfinished, efficient website.
Also, scalability is not really to do with your db platform, but to do with how you design your site. Note also that some platforms scale better when adding more servers (MySQL) and others do better when increasing your server resources (PostgreSQL).
Please note as of today, MySql is not a free project aka as free as postgresql. One of the main reason why i had to switch over to PG. (Thankx to NPGSQL and PgAdmin III, it was a lot easier than it was rumoured)
However MySql does have number of advantages related to applications,addons,forums and looked good on resume.
PostgreSql is a much mature DBMS. It is a objectRDBMS. It has been around for more than 15 years. It is not known to have defaulted on any major issues. It is well known to handle transactions running in millions of rows successfully. The most important is, it's high rate of compliance with SQL standards. Infact in professional circles, it is more of an Oracle of Free RDBMS rather than MySql of popular applications.

Apart from initial cost, are there any other benefits of using MySQL over MSQL server with .net?

I've used both and I've found MySql to have several frustrating bugs, limited support for: IDE integration, profiling, integration services, reporting, and even lack of a decent manager. Total cost of ownership of MSSQL Server is touted to be less than MySQL too (.net environment), but maintaining an open mind could someone point out any killer features of MySql?
I've used MySQL in the past and I'm using MSSQL lately but I can't remember anything that MySQL has and MSSQL can't do.
I think the most killer feature of MySQL it's the simplicity. For some projects you just don't need all the power you can have with a huge system like MSSQL. I have an UNIX heritage and find the simple configuration file like my.ini a killer feature of MySQL.
Also the security system of MySQL is much less robust but it makes the job right for most of applications. I believe MySQL it's killer itself from this point of view, and should stay that way, letting young users being introduced to RDBMS with a simple view first. If your project gets big enough that you are considering switch to a more robust system, then MSSQL can pop as a possibility.
That's what happened to me.
The only thing I can think of, off hand, is locking. SQLServer has traditionally had poor locking strategy that has tripped many people up.
You should use what you prefer, ultimately. Its not as if MySQL is not good enough to compete with MS SQL, eg. Slashdot uses MySQL, so its hardly got problems with high-scalability performance.
Its killer feature though, is that it is free - you can deploy as many of them without worrying one fig about licensing issues. That's more important for the spread of software than anyone could imagine.
(TCO is a difficult thing to calculate - and is advice only ever given from paid consultants and other vested interests. Ignore that. MSSQL is expensive and MySQL is free.)
About 6 years ago I developed a custom e-commernce website using ASP and MySQL for the database. At the time MySQL was clearly a better choice than MSDE which had built in throttling which concerned me enough to use MySQL. Also the difference in coding between using MySQL and MSDE/SQL was not that different or much of a concern.
Now all these years later I'm trying to get the code converted to .NET and even after purchasing commercial MySQL drivers from CRLab. I found that, as you hinted, the IDE integration is just not up to par.
I will say that MySQL is doing a great job even with our database tables approaching 4GB. So when I switch to MSSQL I have to go ahead and get SQL Workstation or higher ($$$), and not use SQL Express which has a 4gb limit.
All of my experience has changed the way I develop new websites. Now, unless it is expected to have a lot of traffic. I use VistaDB and then upgrade to SQL Server if needed. VistaDB is syntax and datasource compatible with SQL Server. And the best part is it is only a single file for the database and a dll for your bin folder.
That's my two cents based on my personal experience with using MySQL in ASP and now .NET.
I work with MSSQL, MySql and PostGres regularly (using .net, java and PHP). One of my favorite things about about MySQL (esp. compared to MSSQL) is the ease with which you can run and restore full database backups.
MSSQL's model of using .bak files is really ugly and time-consuming (topic for another post.) But if you want to do somethign like automated testing, or automated build processes (that include building a db from scratch), MySQL can be a bit easier to deal with.
A few other points:
The management tools have gotten a lot better since the early days.
If you are interested in transactions, constraints, etc.. be sure you are defining your tables to use the InnoDB storage engine (instead of MyISAM which is designed for speed.)
I do miss MSSQL's schema generating tool, but I think there are equivalent tools out there.
We've used a Linux database server and a window's web server (for .net apps) with great success.
If you are using something like NHibernate or some other non-MS data abstraction layer, the case to look beyond MSSQL is stronger too...
Three points to consider; unfortunately the first two are contradictory:
1) .NET and MySQL were not designed to interact with one another, and there is no official support from either side. You're invariably going to encounter issues trying to use them together.
2) If portability off of Windows may ever be an issue (much .NET code runs quite nicely on other platforms via Mono), you'll want to avoid locking yourself too deeply to MSSQL. That doesn't mean not using it, but being careful that you don't rely on its particular quirks too much.
3) TCO is just a buzzword. It's complete nonsense when it's calculated by anyone other than you. Nobody can make such a calculation and honestly claim that it has any relevance outside their particular environment. There are too many factors, most of which have absolutely nothing to do with things like tool availability.
I've been using the community version of MySQL for alsmost 99% of my project. I like MySQL is that I can deploy via Xcopy and is powerful compare to other "xcopy-able" database server. I also wrote a wrapper to start and stop MySQL & Apache (like LAMP), but with my own implemetation and addon capability
MySQL probably has a lower TCO, since administration and configuration is more simple and straightforward than the Spaghetti GUI that MS SQL makes you do most of the configuration through, having to dig through hundreds of obscure properties dialogs to accomplish even basic administration tasks.
There is one area where MS SQL clearly excels over MySQL in my experience:
Integration with other technologies. MS SQL allows you to replicate back and forth with Oracle and MySQL databases, and provides SSIS for executing scheduled data transformations from other database servers.
There may be others, but I don't have experience with them.

Is Oracle RDBMS more stable, secure, robust, etc. than MySQL RDBMS?

I've worked on a variety of systems as a programmer, some with Oracle, some with MySQL. I keep hearing people say that Oracle is more stable, more robust, and more secure. Is this the case?
If so in what ways and why?
For the purposes of this question, consider a small-medium sized production DB, perhaps 500,000 records or so.
Yes. Oracle is enterprise grade software.
I'm not sure if its really any more stable that mysql, I haven't used mysql that much, but I dont ever remember having mysql crash on me. I've had oracle crash, but when it does, it gives me more information about why it crashed than I could possibly want, and Oracle support is always there to help ( for a fee ).
Its very very robust, Oracle DB will do virtually everything it can before breaking your data, I've had mysql servers do really weird things when they run out of disk space, Oracle will just halt all transactions, and eventually shutdown if it can't write the files it needs. I've never lost data in oracle, even when I do stupid things like forget the where clause and update every row rather than a single row, its very easy to get the database back to how it was before screwing up.
Not sure about security, certainly Oracle gives you lots of options for how you are going to connect to the DB and authenticate. It gives lots of options regarding which users have access to what, etc. But as with most things, if you want to take security seriously, then you need an expert to do it. Oracle certainly has a lot more to lose if they don't get security right. But, as with all things there has been exploits.
If nothing else, just consider this... When Oracle stuffs up, they have customers who are paying $40k per CPU (if they are suckers and pay list price) license + yearly maintenance fees.. This gives them a very strong intensive to make sure the customers are happy with the product.
For a small database, I'd seriously recommend Oracle XE well before mysql. It has the important features of mysql (Free), its dead easy to install, comes with a nice web interface and application framework (Application Express), if you DB will happy run on a single cpu, 1gb ram and 4gb data, then XE is the way to go IMHO.
Mysql has its uses, many many people have shown that you can build great things with it, but its far behind oracle (and SQL Server, and DB2) in terms of features... But then, its also free and very easy to learn, which for many people is the most important feature.
I've had Oracle create a corrupt database when the disk ran out of space. It's hard to debug, uses loads of resources and is difficult to work with without seriously skilled DBA's holding your hand. Oracle even replaced system binaries (e.g. gcc) in /usr/bin/ when I installed in on an occation.
Working with PostgreSQL, on the other hand, has been much more pleasant. It gives readable error messages and acts in a more understandable way if you're used to work with open source *nix systems. It's quite easy to set up replication, thus making your data fairly secure.
A 500K record database can probably be run on your mobile phone. Seriously, it's so small that both Oracle XE and MySQL will be more than sufficient to manage it.
for smallish DBs (a few million records), Oracle is overkill
you need an experienced DBA to properly install and manage an Oracle system
Oracle has a larger "base overhead", i.e. you need a beefier machine to run Oracle
the "out of the box" experience of Oracle used to be atrocious (i haven't installed an oracle system in years; no idea how it currently behaves), while mysql is very nice
Oracle is a beast that really needs DBA knowledge. I concur with those who say 500k records are nothing. It's not worth the complexity of Oracle if it's simple numeric/text data.
On the other hand, Oracle is extremely efficient with blobs. If each of your records was a 100MB binary file, you'd need a fortune to run it on Oracle (I'd recommend a 3-node RAC cluster with a good SAN).
I have a project that sends data (~10M rows, 1.2GB of data) to three different databases, 2 Oracle and 1 MySQL. I haven't had problems working with either system, nor have I seen any major advantages on either side. If you're in a place that already uses Oracle for other projects, adding on one new database shouldn't be too much of a problem, but if you're thinking of setting up a new database server and don't have anything in place already, MySQL will save you the money.
Oracle Enterprise assumes that there is an Enterprise to support it, ie, a real Oracle DBA. A novice (but competent) DBA should be able to secure MySQL much more easily than Oracle, just because Oracle is inherently more complex. Of course, Oracle has the Enterprise monitoring tools beyond what MySQL currently features (as far as I've seen) but the DBA needs to be able use them to be effective.
Such a small database as you describe could be handled by most anything so I can't see that Oracle would be warranted unless the infrastructure was already in place. Both have replication, transactions and warm-backups so either would serve well.
The answer depends entirely on how you configure each DBMS.
Both are capable of handling 500,000 records many times over.
Oracle is a lot beefier. Many of its features would only be looked for in a larger enterprise or high-performance setting. They're mainly features to do with scaling, replication and load balancing.
For small DBs, consider SQLite. For small-medium, look at MySQL or PostgreSQL. For the largest, look at MSSQL, Oracle, DB2, etc.
Edit: Having read the other answer, I'll add that if your data is really, really critical, you'll want a replicated setup and you'll probably want to look to one of the big DB providers for something like that.
If you can sacrifice potential (exceedingly rare) data losses and would prefer improved performance, look at some of the lighter-weight options.
It's true that Oracle is a beast.
It is also true that Oracle is widely considered the most secure major database.
The problem is that Oracle's devs don't appear to grasp critical security consepts. Oracle is the least secure database server on the market (According to independent security researchers)
http://itic-corp.com/blog/2010/09/sql-server-most-secure-database-oracle-least-secure-database-since-2002/
MySQL is actually fairly secure according to these researchers. I don't know much about the tools available for it. What's most amusing about this research is that the same people who would call Microsoft SQL server a toy would have their data stolen by attackers that MSSQL would thwart because they are using a beast that has a terrible security model rather than a "toy" that is secure.
I'm using Oracle/SQL Server/MySql for different applications and site
No Database beat can Oracle in many different area, but it's the most database that require deep knowledge for the administration.
and if you found a problem with oracle, may spend few times to solve it even with good DBAs guys.
You can go with MySql for 500K or millions of records, it's more light than other DB, and require zero administration work, and will not take a lot of your computer resources, I always have it in my development PC, and never had faced any serious problem with it.
I would require you go with MySql or PostgreSQL if you don't need the advanced featuers of Oracle.

Where to find a good reference when choosing a database?

I and two others are working on a project at the university.
In the project we are making a prototype of a MMORPG.
We have decided to use PostgreSQL as our database. The other databases we considered were MS SQL-server and MySQL.
Does somebody have a good reference which will justify our choice? (preferably written during the last year)
Someone recently recommended me wikivs.com: MySQL vs. PostgreSQL - it is a quite detailed comparison of those two, and might be of help to you.
the most mentioned difference between MySQL and PostgreSQL is about your reading/writing ratios. If you read a lot more than you write, MySQL is usually faster; but if you do a lot of heavy updates to a table, as often as other threads have to read, then the default locking in MySQL is not the best, and PostgreSQL can be a better choice, performance-wise.
IOW, PostgreSQL scales better regarding to DB writes.
that's why it's usually said that MySQL is best for webapps, while PostgreSQL is more 'enterprisey'.
Of course, the picture is not so simple:
InnoDB tables on MySQL have a very different performance behaviour
At the load levels where PostgreSQL's better locks overtake MySQL's, other parts of your platform could be the bottlenecks.
PostgreSQL does comply better with standards, so it can be easier to replace later.
in the end, the choice has so many variables that no matter which way you go, you'll find some important issue that makes it the right choice.
Go with something that someone in your team has actual experience of using in production. All databases have issues which frequent users are aware of.
I cannot stress enough that someone in the team needs PRODUCTION experience of using it. Not using it for their homework, or to keep their list of CDs in.
All of these databases have their advantages and disadvantages. Which is better is dependent on:
Your teams experience
Your exact requirements
Your current environemnt e.g. whats your app written in and going to be hosted on?
SQL servers main problem is the cost unless you use express edition which has performance limitations however its very easy to use and has a number of good tools.
There is a comparison of the different sql versions at:
http://www.microsoft.com/sql/prodinfo/features/compare-features.mspx
You could then compare these with MySQL and PostGre.
If the purpose of this comparison is a theoretical one for your essay then you can reference web pages such as the microsoft link and compare performance, cost etc.
Postgresql has a page of case studies that you can quote and link to.
Really, any of the above would have worked for you. I personally like PostgreSQL. One solid advantage it has over MSSQL (even assuming you can get it for "free") is that PostgreSQL is non-proprietary. If you're going to introduce a dependency into your project (and re-inventing an RDBMS would be crazy), you don't want it to be a black box.

MySQL vs PostgreSQL? Which should I choose for my Django project?

My Django project is going to be backed by a large database with several hundred thousand entries, and will need to support searching (I'll probably end up using djangosearch or a similar project.)
Which database backend is best suited to my project and why? Can you recommend any good resources for further reading?
For whatever it's worth the the creators of Django recommend PostgreSQL.
If you're not tied to any legacy
system and have the freedom to choose
a database back-end, we recommend
PostgreSQL, which achives a fine
balance between cost, features, speed
and stability. (The Definitive Guide to Django, p. 15)
As someone who recently switched a project from MySQL to Postgresql I don't regret the switch.
The main difference, from a Django point of view, is more rigorous constraint checking in Postgresql, which is a good thing, and also it's a bit more tedious to do manual schema changes (aka migrations).
There are probably 6 or so Django database migration applications out there and at least one doesn't support Postgresql. I don't consider this a disadvantage though because you can use one of the others or do them manually (which is what I prefer atm).
Full text search might be better supported for MySQL. MySQL has built-in full text search supported from within Django but it's pretty useless (no word stemming, phrase searching, etc.). I've used django-sphinx as a better option for full text searching in MySQL.
Full text searching is built-in with Postgresql 8.3 (earlier versions need TSearch module). Here's a good instructional blog post: Full-text searching in Django with PostgreSQL and tsearch2
large database with several hundred
thousand entries,
This is not large database, it's very small one.
I'd choose PostgreSQL, because it has a lot more features. Most significant it this case: in PostgreSQL you can use Python as procedural language.
Go with whichever you're more familiar with. MySQL vs PostgreSQL is an endless war. Both of them are excellent database engines and both are being used by major sites. It really doesn't matter in practice.
All the answers bring interesting information to the table, but some are a little outdated, so here's my grain of salt.
As of 1.7, migrations are now an integral feature of Django. So they documented the main differences that Django developers might want to know beforehand.
Backend Support
Migrations are supported on all backends that Django ships with, as
well as any third-party backends if they have programmed in support
for schema alteration (done via the SchemaEditor class).
However, some databases are more capable than others when it comes to schema migrations; some of the caveats are covered below.
PostgreSQL
PostgreSQL is the most capable of all the databases here in terms of schema support.
MySQL
MySQL lacks support for transactions around schema alteration operations, meaning that if a migration fails to apply you will have to manually unpick the changes in order to try again (it’s impossible to roll back to an earlier point).
In addition, MySQL will fully rewrite tables for almost every schema operation and generally takes a time proportional to the number of rows in the table to add or remove columns. On slower hardware this can be worse than a minute per million rows - adding a few columns to a table with just a few million rows could lock your site up for over ten minutes.
Finally, MySQL has relatively small limits on name lengths for columns, tables and indexes, as well as a limit on the combined size of all columns an index covers. This means that indexes that are possible on other backends will fail to be created under MySQL.
SQLite
SQLite has very little built-in schema alteration support, and so
Django attempts to emulate it by:
Creating a new table with the new schema
Copying the data across
Dropping the old table
Renaming the new table to match the original name
This process generally works well, but it can be slow and occasionally
buggy. It is not recommended that you run and migrate SQLite in a
production environment unless you are very aware of the risks and its
limitations; the support Django ships with is designed to allow
developers to use SQLite on their local machines to develop less
complex Django projects without the need for a full database.
Even if Postgresql looks better, I find it has some performances issues with Django:
Postgresql is made to handle "long connections" (connection pooling, persistant connections, etc.)
MySQL is made to handle "short connections" (connect, do your queries, disconnect, has some performances issues with a lot of open connections)
The problem is that Django does not support connection pooling or persistant connection, it has to connect/disconnect to the database at each view call.
It will works with Postgresql, but connecting to a Postgresql cost a LOT more than connecting to a MySQL database (On Postgresql, each connection has it own process, it's a lot slower than just popping a new thread in MySQL).
Then you get some features like the Query Cache that can be really useful on some cases. (But you lost the superb text search of PostgreSQL)
When a migration fails in django-south, the developers encourage you not to use MySQL:
! The South developers regret this has happened, and would
! like to gently persuade you to consider a slightly
! easier-to-deal-with DBMS (one that supports DDL transactions)
Having gone down the road of MySQL because I was familiar with it (and struggling to find a proper installer and a quick test of the slow web "workbench" interface of postgreSQL put me off), at the end of the project, after a few months after deployment, while looking into back up options, I see you have to pay for MySQL's enterprise back up features. Gotcha right at the very end.
With MySql I had to write some ugly monster raw SQL queries in Django because no select distinct per group for retrieving the latest per group query. Also looking at postgreSQL's full-text search and wishing I had used postgresSQL.
I recommend PostgreSQL even if you are familiar with MySQL, but your mileage may vary.
UPDATE: DBeaver is a great equivalent of MySql Workbench gui tool but works with PostgreSQL very nicely (and many others as its a universal DB tool).
To add to previous answers :
"Full text search might be better supported for MySQL"
The FULLTEXT index in MySQL is a joke.
It only works with MyISAM tables, so you lose ACID, Transactions, Constraints, Relations, Durability, Concurrency, etc.
INSERT/UPDATE/DELETE to a largish TEXT column (like a forum post) will a rebuild a large part of the index. If it does not fit in myisam_key_buffer, then large IO will occur. I've seen a single forum post insertion trigger 100MB or more of IO ... meanwhile the posts table is exclusiely locked !
I did some benchmarking (3 years ago, may be stale...) which showed that on large datasets, basically postgres fulltext is 10-100x faster than mysql, and Xapian 10-100x faster than postgres (but not integrated).
Other reasons not mentioned are the extremely smart query optimizer, large choice of join types (merge, hash, etc), hash aggregation, gist indexes on arrays, spatial search, etc which can result in extremely fast plans on very complicated queries.
Will this application be hosted on your own servers or by a hosting company? Make sure that if you are using a hosting company, they support the database of choice.
There is a major licensing difference between the two db that will affect you if you ever intend to distribute code using the db. MySQL's client libraries are GPL and PostegreSQL's is under a BSD like license which might be easier to work with.