MySQL vs PostgreSQL? Which should I choose for my Django project? - mysql

My Django project is going to be backed by a large database with several hundred thousand entries, and will need to support searching (I'll probably end up using djangosearch or a similar project.)
Which database backend is best suited to my project and why? Can you recommend any good resources for further reading?

For whatever it's worth the the creators of Django recommend PostgreSQL.
If you're not tied to any legacy
system and have the freedom to choose
a database back-end, we recommend
PostgreSQL, which achives a fine
balance between cost, features, speed
and stability. (The Definitive Guide to Django, p. 15)

As someone who recently switched a project from MySQL to Postgresql I don't regret the switch.
The main difference, from a Django point of view, is more rigorous constraint checking in Postgresql, which is a good thing, and also it's a bit more tedious to do manual schema changes (aka migrations).
There are probably 6 or so Django database migration applications out there and at least one doesn't support Postgresql. I don't consider this a disadvantage though because you can use one of the others or do them manually (which is what I prefer atm).
Full text search might be better supported for MySQL. MySQL has built-in full text search supported from within Django but it's pretty useless (no word stemming, phrase searching, etc.). I've used django-sphinx as a better option for full text searching in MySQL.
Full text searching is built-in with Postgresql 8.3 (earlier versions need TSearch module). Here's a good instructional blog post: Full-text searching in Django with PostgreSQL and tsearch2

large database with several hundred
thousand entries,
This is not large database, it's very small one.
I'd choose PostgreSQL, because it has a lot more features. Most significant it this case: in PostgreSQL you can use Python as procedural language.

Go with whichever you're more familiar with. MySQL vs PostgreSQL is an endless war. Both of them are excellent database engines and both are being used by major sites. It really doesn't matter in practice.

All the answers bring interesting information to the table, but some are a little outdated, so here's my grain of salt.
As of 1.7, migrations are now an integral feature of Django. So they documented the main differences that Django developers might want to know beforehand.
Backend Support
Migrations are supported on all backends that Django ships with, as
well as any third-party backends if they have programmed in support
for schema alteration (done via the SchemaEditor class).
However, some databases are more capable than others when it comes to schema migrations; some of the caveats are covered below.
PostgreSQL
PostgreSQL is the most capable of all the databases here in terms of schema support.
MySQL
MySQL lacks support for transactions around schema alteration operations, meaning that if a migration fails to apply you will have to manually unpick the changes in order to try again (it’s impossible to roll back to an earlier point).
In addition, MySQL will fully rewrite tables for almost every schema operation and generally takes a time proportional to the number of rows in the table to add or remove columns. On slower hardware this can be worse than a minute per million rows - adding a few columns to a table with just a few million rows could lock your site up for over ten minutes.
Finally, MySQL has relatively small limits on name lengths for columns, tables and indexes, as well as a limit on the combined size of all columns an index covers. This means that indexes that are possible on other backends will fail to be created under MySQL.
SQLite
SQLite has very little built-in schema alteration support, and so
Django attempts to emulate it by:
Creating a new table with the new schema
Copying the data across
Dropping the old table
Renaming the new table to match the original name
This process generally works well, but it can be slow and occasionally
buggy. It is not recommended that you run and migrate SQLite in a
production environment unless you are very aware of the risks and its
limitations; the support Django ships with is designed to allow
developers to use SQLite on their local machines to develop less
complex Django projects without the need for a full database.

Even if Postgresql looks better, I find it has some performances issues with Django:
Postgresql is made to handle "long connections" (connection pooling, persistant connections, etc.)
MySQL is made to handle "short connections" (connect, do your queries, disconnect, has some performances issues with a lot of open connections)
The problem is that Django does not support connection pooling or persistant connection, it has to connect/disconnect to the database at each view call.
It will works with Postgresql, but connecting to a Postgresql cost a LOT more than connecting to a MySQL database (On Postgresql, each connection has it own process, it's a lot slower than just popping a new thread in MySQL).
Then you get some features like the Query Cache that can be really useful on some cases. (But you lost the superb text search of PostgreSQL)

When a migration fails in django-south, the developers encourage you not to use MySQL:
! The South developers regret this has happened, and would
! like to gently persuade you to consider a slightly
! easier-to-deal-with DBMS (one that supports DDL transactions)

Having gone down the road of MySQL because I was familiar with it (and struggling to find a proper installer and a quick test of the slow web "workbench" interface of postgreSQL put me off), at the end of the project, after a few months after deployment, while looking into back up options, I see you have to pay for MySQL's enterprise back up features. Gotcha right at the very end.
With MySql I had to write some ugly monster raw SQL queries in Django because no select distinct per group for retrieving the latest per group query. Also looking at postgreSQL's full-text search and wishing I had used postgresSQL.
I recommend PostgreSQL even if you are familiar with MySQL, but your mileage may vary.
UPDATE: DBeaver is a great equivalent of MySql Workbench gui tool but works with PostgreSQL very nicely (and many others as its a universal DB tool).

To add to previous answers :
"Full text search might be better supported for MySQL"
The FULLTEXT index in MySQL is a joke.
It only works with MyISAM tables, so you lose ACID, Transactions, Constraints, Relations, Durability, Concurrency, etc.
INSERT/UPDATE/DELETE to a largish TEXT column (like a forum post) will a rebuild a large part of the index. If it does not fit in myisam_key_buffer, then large IO will occur. I've seen a single forum post insertion trigger 100MB or more of IO ... meanwhile the posts table is exclusiely locked !
I did some benchmarking (3 years ago, may be stale...) which showed that on large datasets, basically postgres fulltext is 10-100x faster than mysql, and Xapian 10-100x faster than postgres (but not integrated).
Other reasons not mentioned are the extremely smart query optimizer, large choice of join types (merge, hash, etc), hash aggregation, gist indexes on arrays, spatial search, etc which can result in extremely fast plans on very complicated queries.

Will this application be hosted on your own servers or by a hosting company? Make sure that if you are using a hosting company, they support the database of choice.

There is a major licensing difference between the two db that will affect you if you ever intend to distribute code using the db. MySQL's client libraries are GPL and PostegreSQL's is under a BSD like license which might be easier to work with.

Related

Benefits, etc of using mySQL over SQLite (RoR)

I'm building a web application right now for my company and for whatever reason, can't get mySQL working on my Mac w/ my Ruby install (OSX 10.5). SQLite works fine though, so would it be a problem to use SQLite for now so I can get to work on this and then just change up my database.yml file to point to a mySQL database when I deploy (assuming I rerun migrations and such)?
Also, what are the benefits/drawbacks of using mySQL over SQLite in a RoR application? I've always used mySQL by default in the past, but never learned SQL directly (always through ActiveRecord) and never thought too much about the difference.
Benefits of MySQL/PostrgreSQL/etc
Pros
Stronger data typing, which means cleaner data
Ability to store more data
Scale better to larger data sets
Spatial support (think GPS)
Full Text Search (FTS)
Cons
Stronger data typing means data will be validated, bad data will cause errors
Not a good candidate (if even possible) for devices with limited resources (iPhone, Blackberry, iPad, etc)
I would pick PostgreSQL v8.4+ over MySQL given the choice. MySQL's features lag behind the rest of the major SQL database alternatives.
THe biggest performance issue you may run into is table locks. SQLite unfortunately does not have row level locking. So if your app is going to run multiple processes / threads (as with multiple web users) its likely some threads will not be able to perform an SQL op. For this reason i would go with MySQL - or perhaps Postgresql.
Should be no problems, as MySQL should have a superset of SQLite capabilities, and as #Sean pointed out, performance should only increase. Just try to make sure you're not using anything too SQLite specific (I'm mainly a SQL Server and Oracle guy, so don't know what that would be, if anything). Remember, the "S" in SQL stands for Structured, not Standard ;)
Paul.
SQLite is perfect for a desktop or smartphone application ("embedded" usage). However, if you plan to build a web-application, you are highly encouraged to make use of a non-embedded DMS like MySQL. The benefits are countless, such as 3rd party design and analysis apps, performance etc ...

Maximum capabilities of MySQL

How do I know when a project is just to big for MySQL and I should use something with a better reputation for scalability?
Is there a max database size for MySQL before degradation of performance occurs? What factors contribute to MySQL not being a viable option compared to a commercial DBMS like Oracle or SQL Server?
Google uses MySQL. Is your project bigger than Google?
Smart-alec comments aside, MySQL is a professional level database application. If your application puts a strain on MySQL, I bet it'll do the same to just about any other database.
If you are looking for a couple of examples:
Facebook moved to Cassandra only after it was storing over 7 Terabytes of inbox data. (Source: Lakshman, Malik: Cassandra - A Decentralized Structured Storage System.) (... Even though they were having quite a few issues at that stage.)
Wikipedia also handles hundreds of Gigabytes of text data in MySQL.
I work for a very large Internet company. MySQL can scale very, very large with very good performance, with a couple of caveats.
One problem you might run into is that an index greater than 4 gigabytes can't go into memory. I spent a lot of time once trying to improve the MySQL's full-text performance by fiddling with some index parameters, but you can't get around the fundamental problem that if your query hits disk for an index, it gets slow.
You might find some helper applications that can help solve your problem. For the full-text problem, there is Sphinx: http://www.sphinxsearch.com/
Jeremy Zawodny, who now works at Craig's List, has a blog on which he occasionally discusses the performance of large databases: http://blog.zawodny.com/
In summary, your project probably isn't too big for MySQL. It may be too big for some of the ways that you've used MySQL before, and you may need to adapt them.
Mostly it is table size.
I am assuming here that you will use the Oracle innoDB plugin for mysql as your engine. If you do not, that probably means you're using a commercial engine such as infiniDB, InfoBright for Tokutek, in which case your questions should be sent to them.
InnoDB gets a bit nasty with very large tables. You are advised to partition your tables if at all possible with very large instances. Essentially, if your (frequently used) indexes don't all fit into ram, inserts will be very slow as they need to touch a lot of pages not in ram. This cannot be worked around.
You can use the MySQL 5.1 partitioning feature if it does what you want, or partition your tables at the application level if it does not. If you can get your tables' indexes to fit in ram, and only load one table at a time, then you're on a winner.
You can use the plugin's compression to make your ram go a bit further (as the pages are compressed in ram as well as on disc) but it cannot beat the fundamental limtation.
If your table's indexes don't all (or at least MOSTLY - if you have a few indexes which are NULL in 99.99% of cases you might get away without those ones) fit in ram, insert speed will suck.
Database size is not a major issue, provided your tables individually fit in ram while you're doing bulk loading (and of course, you only load one at once).
These limitations really happen with most row-based databases. If you need more, consider a column database.
Infobright and Infinidb both use a mysql-based core and are column based engines which can handle very large tables.
Tokutek is quite interesting too - you may want to contact them for an evaluation.
When you evaluate the engine's suitability, be sure to load it with very large data on production-grade hardware. There's no point in testing it with a (e.g.) 10G database, that won't prove anything.
MySQL is a commercial DBMS, you just have the option to get the support/monitoring that is offered by Oracle or Microsoft. Or you can use community support or community provided monitoring software.
Things you should look at are not only size at operations. Critical are also:
Scenaros for backup and restore?
Maintenance. Example: SQL Server Enterprise can rebuild an index WHILE THE OLD ONE IS AVAILABLE - transparently. This means no downtime for an index rebuild.
Availability (basically you do not want to have to restoer a 5000gb database if a server dies) - mirroring preferred, replication "sucks" (technically).
Whatever you go for, be carefull with Oracle RAC (their cluster) - it is known to be "problematic" (to say it finely). SQL Server is known to be a lot cheaper, scale a lot worse (no "RAC" option) but basically work without making admins want to commit suicide every hour (the "RAC" option seems to do that). Scalability "a lot worse" still is good enough for the Terra Server (http://msdn.microsoft.com/en-us/library/aa226316(SQL.70).aspx)
THere wer some questions here recently of people having problems rebuilding indices on a 10gb database or something.
So much for my 2 cents. I am sure some MySQL specialists will jump in on issues there.

Switching from MySQL to Cassandra - Pros/Cons?

For a bit of background - this question deals with a project running on a single small EC2 instance, and is about to migrate to a medium one. The main components are Django, MySQL and a large number of custom analysis tools written in python and java, which do the heavy
lifting. The same machine is running Apache as well.
The data model looks like the following - a large amount of real time data comes in streamed from various networked sensors, and ideally, I'd like to establish a long-poll approach rather than the current poll every 15 minutes approach (a limitation of computing stats and writing into the database itself). Once the data comes in, I store the raw version in
MySQL, let the analysis tools loose on this data, and store statistics in another few tables. All of this is rendered using Django.
Relational features I would need -
Order by [SliceRange in Cassandra's API seems to satisy this]
Group by
Manytomany relations between multiple tables [Cassandra SuperColumns seem to do well for one to many]
Sphinx on this gives me a nice full text engine, so thats a necessity too. [On Cassandra, the Lucandra project seems to satisfy this need]
My major problem is that data reads are extremely slow (and writes aren't that hot either). I don't want to throw a lot of money and hardware on it right now, and I'd prefer something that can scale easily with time. Vertically scaling MySQL is not trivial in that sense (or cheap).
So essentially, after having read a lot about NOSQL and experimented with things like MongoDB, Cassandra and Voldemort, my questions are,
On a medium EC2 instance, would I gain any benefits in reads/writes by shifting to something like Cassandra? This article (pdf) definitely seems to suggest that. Currently, I'd say a few hundred writes per minute would be the norm. For reads - since the data changes every 5 minutes or so, cache invalidation has to happen pretty quickly. At some point, it should be able to handle a large number of concurrent users as well. The app performance currently gets killed on MySQL doing some joins on large tables even if indexes are created - something to the order of 32k rows takes more than a minute to render. (This may be an artifact of EC2 virtualized I/O as well). Size of tables is around 4-5 million rows, and there are about 5 such tables.
Everyone talks about using Cassandra on multiple nodes, given the CAP theorem and eventual consistency. But, for a project that is just beginning to grow, does it make sense
to deploy a one node cassandra server? Are there any caveats? For instance, can it replace MySQL as a backend for Django? [Is this recommended?]
If I do shift, I'm guessing I'll have to rewrite parts of the app to do a lot more "administrivia" since I'd have to do multiple lookups to fetch rows.
Would it make any sense to just use MySQL as a key value store rather than a relational engine, and go with that? That way I could utilize a large number of stable APIs available, as well as a stable engine (and go relational as needed). (Brett Taylor's post from Friendfeed on this - http://bret.appspot.com/entry/how-friendfeed-uses-mysql)
Any insights from people who've done a shift would be greatly appreciated!
Thanks.
Cassandra and the other distributed databases available today do not provide the kind of ad-hoc query support you are used to from sql. This is because you can't distribute queries with joins performantly, so the emphasis is on denormalization instead.
However, Cassandra 0.6 (beta officially out tomorrow, but you can build from the 0.6 branch yourself if you're impatient) supports Hadoop map/reduce for analytics, which actually sounds like a good fit for you.
Cassandra provides excellent support for adding new nodes painlessly, even to an initial group of one.
That said, at a few hundred writes/minute you're going to be fine on mysql for a long, long time. Cassandra is much better at being a key/value store (even better, key/columnfamily) but MySQL is much better at being a relational database. :)
There is no django support for Cassandra (or other nosql database) yet. They are talking about doing something for the next version after 1.2, but based on talking to django devs at pycon, nobody is really sure what that will look like yet.
If you're a relational database developer (as I am), I'd suggest/point out:
Get some experience working with Cassandra before you commit to its use on a production system... especially if that production system has a hard deadline for completion. Maybe use it as the backend for something unimportant first.
It's proving more challenging than I'd anticipated to do simple things that I take for granted about data manipulation using SQL engines. In particular, indexing data and sorting result sets is non-trivial.
Data modelling has proven challenging as well. As a relational database developer you come to the table with a lot of baggage... you need to be willing to learn how to model data very differently.
These things said, I strongly recommend building something in Cassandra. If you're like me, then doing so will challenge your understanding of data storage and make you rethink a relational-database-fits-all-situations outlook that I didn't even realize I held.
Some good resources I've found include:
Dominic Williams' Cassandra blog posts
Secondary Indexes in Cassandra
More from Ed Anuff on indexing
Cassandra book (not fantastic, but a good start)
"WTF is a SuperColumn" pdf
The Django-cassandra is an early beta mode. Also Django didn't made for no-sql databases. The key in Django ORM is based on SQL (Django recommends to use PostgreSQL). If you need to use ONLY no-sql (you can mix sql and no-sql in same app) you need to risky use no-sql ORM (it significantly slower than traditional SQL orm or direct use of No-SQL storage). Or you'll need to completely full rewrite django ORM. But in this case i can't presume, why you need Django. Maybe you can use something else, like Tornado?

Are the consistency/data loss/query optimization issues I read about "that bad"?

As I've been looking into the differences between Postgres and MySQL, it has struck me that, if what I read is to be believed, MySQL should be (disclaimer: by reading the rest of this sentence, you agree to read the next paragraph as well) the laughingstock of the RMDB world: it doesn't enforce ACID by default, the net is rife with stories of MySQL-related data loss and by all accounts and the query optimizer is a joke.
But none of this seems to matter. It's not hard to tell that MySQL has about a million times* as much hype as Postgres (it's LAMP, not LAPP), big installations of MySQL are not unheard of (LJ? Digg?) and I haven't noticed a drop in MySQL's popularity.
This makes me wonder: are these "problems" with MySQL really that bad?
So, if you have used MySQL for a reasonably large project**, what was your experience like? Did you use Postgres as well? How was it worse? How was it better?
*: [citation needed]
**: I'm well aware that, for "small things" (blogs, what have you), MySQL (along with practically every other RDB) is just fine.
Since it's tagged [subjective], I'll be subjective. For me it's about the little things. PostgreSQL is more developer friendly and makes it easy to do the right thing regarding data integrity by default.
If you give MySQL an incorrect type, it will implicitly convert it even if the conversion is incorrect. PostgreSQL will complain.
EXPLAIN in PostgeSQL is way more useful than in MySQL. It gives you the exact structured query plan. What kind of algorithm will it use, what cost does does each step have, etc. This means that if the query optimizer in MySQL doesn't do what you think it does, you will have hard time to debug it.
If you ever wrote anything more complex in the MySQL stored procedure language, you will know how painful it is. PL/pgSQL is actually a nice language + you can use many other languages.
MySQL doesn't have sequences, so if you need them you have to roll your own. Most people will do it wrong and have race conditions in their code.
PostgreSQL exposes most of it's internal lock types to the developer. If you need to lock your table in a special way, you can do that.
Everything is programmable in PostgreSQL. For example, if you need your own data type for some specific data, you can add it. You can add casts and operators for the data types. Probably not worth the effort for small projects, but it's better than storing things as strings.
PostgreSQL adds every action including DDL changes to a transaction, unlike MySQL. If you have a conversion script that creates/drops tables, BEGIN/END won't help you in MySQL to keep it in consistent state.
That doesn't mean it's impossible to write good database applications with MySQL, it just requires more effort.
MySQL can be used for reasonably large applications, provided you really know what you do and don't trust the defaults.
MySQL defaults are optimized to be easy-to-use and to get started quickly and to provide best performance (usually). Other databases choose defaults that are at the very least ACID and are scalable (i.e. choose defaults that are not necessarily the best/fastest for small data sets)
Another item is that MySQL only learned to be a "real database" relatively recent, while almost all competing products started life with full ACID in mind.
MySQL had problems with almost all aspects of ACID at one time or another. Most of them are gone or can be configured away, but you will have to check each one. The problem with troubles in atomicity for example is that you will not notice them until you place your system under heavy load (which often coincides with it being a production system, unfortunately).
So my summary would be: MySQL is capable of working in this environments, but it takes work. And the path it took to get to that point cost it quite a few points in the confidence area.
Provided you know what its capabilities are, then it may fit your use case.
If used correctly, then it is ACID compliant. If used incorrectly, it is not. The trouble is, that people seem to assume that it's a good thing to have ACID compliance.
In reality ACID is often the enemy of performance (Particularly the D for durability). By relaxing durability very slightly, we can typically get a very large performance boost.
Likewise, even using the MyISAM engine (which doesn't have much by way of durability, and not a lot of the others either) is still appropriate to some problem domains.
We are using MySQL in some applications - and it is doing a pretty good job.
In the newer projects we are using the InnoDB engine - and albeit it may be slower than the default engine it is working well.
Right now we are using an ORM mapper - and so most of the complexity is hidden behind the ORM mapper (and working nice).
I think the infrastructure (Tools and information) is one of MySQL's big plusses: we are using really nice tools: Toad for MySQL and MySQL Administrator.
Altough I have to admit that I had a shocking experience last week when helping a friend with a SQL statement and the correleated subquery nearly stopped his MySQL server - but with the trick of enclosing it in another query - it worked really well.
This is nothing which REALLY shocks me - because I've used other DB systems which cost big bucks (I'm looking at you - DB2) - and they had other things to work around. (maybe not as drastic - but still you had to optimize for them).
I haven't used both for a single large project, but having used both I have some idea of how they compare.
In general almost all MySQL's problems can be worked around with good discipline. The issue is more that developer has to know all the gotchas and work around them. After working with PostgreSQL or Oracle this feels a bit like death by a thousand papercuts. You get that used to stuff just working.
This is a pretty significant issue in the types of stuff that I have worked on. Complex schemas with complex queries and lots of data. tight schedules with little time for performance engineering meaning that getting consistently reasonable performance without having to manually optimize queries is important. A good cost based optimizer is almost a requirement. Combine that with quite a lot of outsourcing with development teams that don't have the experience to catch all the gotchas in time and the little issues escalate to large QA problems. Hitting any of MySQL silent data corruption gotchas in production is something that really scares me. I'll take any declarative constraints at the database level that I can get to have atleast some safety net, MySQL unfortunately falls short on that.
PostgreSQL has the added benefit that it can run significantly more algorithms using more advanced data-structures in the database. Most of our large projects have a few cases where MySQL will hit its limits. Moving the algorithms outside the database requires considerably more effort with pretty tricky code involving correct locking and synchronization. In particular I have at one time or another hit the need for partial indexes, indexes on expressions, custom aggregate functions, set returning stored procedures, array and hash datatypes, inverted indexes on array values, update/delete-returning, deferrable foreign key constraints.
On the other hand MySQL has at least for now a better story for scale out. If I had to support a huge number users on a reasonably simple application, and had the team to build a heavily partitioned and replicated database with eventual consistency, I'd pick MySQL over PostgreSQL for the low level data storage building block. On the other the competitors in that space are the key-value databases.
are these "problems" with MySQL really that bad?
Actually, the pain MySQL will inflict on you can range from moderate to insane, and much of it depends on MyISAM.
I find a good rule of thumb is this :
are you backing up some MyISAM tables ?
MyISAM is great for data you don't really care about, like traffic logs and the like, or for data that you can easily restore in case of a problem since it's read-only and hence never changed since the time you loaded that 10GB dump. In those cases the compact row format of MyISAM brings great space savings (that however do not translate into faster seq scan speed, for some reason).
If the data you put in MyISAM tables is worth backing up, you are going to enter in a world of hurt when you realize some day that it is all inconsistent because of the lack of FK and constraint checks, and incidentally all your backups will contain inconsistent data too.
If you make lots of concurrent updates to MyISAM tables, then you are gonna go way past the world of hurt stage : when the load reaches a certain threshold, you are doomed. Of course the readers block writers which block readers which block queued writers, etc, so the performance is bad, load avg goes to 200, and your box is nuked, but also I could consistently crasy MyISAM tables in a benchmark I wrote 2 years ago just by hitting them with too much load. Random data ensued, sometimes crashing the mysql on selects or spewing random errors.
So, if you avoid MyISAM like the plague it is, the problems with MySQL aren't really that bad. InnoDB is robust. However, generally I find it inferior to Postgres, which is faster and has so many less gotchas, and Gets The Job Done easier and faster.
No, the issues you mention are NOT a big deal. See Google and Facebook as two examples of companies that are using MySQL to accomplish Herculean tasks you'll only ever dream of encountering.
I use the following rules when running a MySQL to prevent headaches down the line:
Take daily, weekly, monthly snapshots of database. More often than not the problems you'll run in to have nothing to do with MySQL, instead it's a boneheaded developer running:
DELETE FROM mytable; # Where is the WHERE?
Use InnoDB by default, the only reason to use MyISAM is for full text search.
Get your database schema under source control.

Disadvantages of MySQL versus other databases

Every single book that teaches programming (or almost anything else) starts off with a whole bunch of spiel on why what it's about (C++, MySQL, waterskiing, skydiving, dentistry, whatever) is the greatest thing in the world. So I open the MySQL O'Reilly book, and read the intro, and get the traditional sermon. The main points that the book mentioned were:
MySQL has been shown to have tied Oracle as the fastest and most scalable database software.
It's free and open source.
Sounds pretty convincing, but I know there's always at least two sides of every story. I knew I needed to be disillusioned when I saw someone suggest to someone to use Oracle instead of MySQL and thought, "Why in the world would you want to do that?!", just because of the few paragraphs I'd read, with no other justification. So lets investigate the other side of the story:
What are some reasons NOT to use MySQL?
Here's just a random list of stuff that popped into my head. It's CW, so feel free to add to it as necessary.
Oracle provides a top notch ERP built on their database. If your company is subject to Sarbanes-Oxley regulations, this is quite a bit above "crucial."
SQL Server licenses come with Analysis Services, Integration Services, and Reporting Services. If you want to do anything with OLAP, ETL, or reporting, these three are great applications that are built on the SQL Server stack.
SQL Server has native .NET data types (in 2008). Absolutely brilliant for .NET shops dealing with geospatial datasets.
MySQL does not support check constraints.
SQL Server includes the over clause, which helps when dealing with the "top n rows in each group" problem. Essentially, you can do aggregate functions partitioned over the dataset any way you'd like.
SQL Server uses Kerberos and Windows authentication natively. MySQL does not tie into Active Directory.
Superior performance on subqueries (almost any database has subquery performance that is superior to MySQL's)
Oracle, SQL Server, PostgreSQL and others have a richer set of join algorithms available to them; this means joins can often be performed faster, especially when large tables are involved.
MySQL has been shown to have tied oracle as the fastest and most scalable database software.
Making that statement about any two database systems is probably enough to throw the book away without reading the rest. Database systems are not commodities that can be compared with a couple lines of information, and will not be for the foreseeable future.
One reason that the statement is obviously false is that MySQL has very limited plan choices available. For instance, MySQL can't use merge join or hash join -- two fundamental algorithms that have useful performance characteristics. That's pretty much the end of the story for many query workloads. It is trivial to show a reasonable query that is orders of magnitude faster with a merge join.
There are plenty of other criticisms of MySQL versus XYZ and vice-versa. My point is that this is a complex issue, and the book is drastically oversimplifying. If you're getting involved in databases at all, you need to spend time diversifying your knowledge and understanding fundamentals.
My personal opinion is that MySQL and SQLite are the worst places to start. Pick something like Oracle (which can be downloaded free of charge for learning/evaluation, which many don't realize), PostgreSQL (BSD license), or MS SQL. FirebirdSQL might be good, too. Once you familiarize yourself with a few systems, you'll be able to make an informed choice about whether the trade-offs MySQL makes are right for you.
Everyone seems to be missing one of the main reasons to stick with Oracle/MS. You've already got a stable full of DBAs that know those products inside and out.
The default collation in mysql is case-insensitive. This is not a problem per se, but I think this strange default is an indication that it was targeted at hobby-developers, rather than professionals. This is a big assumption, but I'd think any professional would expect a database to compare strings for identity by default (i.e. using a binary collation).
Manipulation of tables during transactions causes implicit COMMITs. While this might not look grieve at the first glance, you will notice that you cannot cannot work under ACID conditions if altering/creating tables is an inherent part of your application.
MySQL can certainly match or beat Oracle in speed. I've done it numerous times myself. Ok, so I had to use various table types like black hole, merge, innodb, and myisam in just the right laces. And it took me a few days to get everything working just right. The Oracle DBA got things working in an hour or two.
MySQL is fine for 98% of the sites out there, maybe more. But it is fairly easy to bring it to a crawl without a lot of data if you don't know what you are doing. Oracle is quite a bit harder to bring to a crawl, but it can still be done. I've worked with both with datasets in the hundreds of millions of records (tiny by some measures). MySQL takes quite a bit more attention.
No database can scale indefinitely, which is why nosql "databases" are becoming so popular. I think the real question is if MySQL is "good enough" for what you need to do. The price is certainly right. The same could be said about PHP.
Why does Facebook use MySQL? Could you imagine what it would cost them to buy enough Oracle licenses!? It's good enough.
The future is of sun (the company behind mysql) is unclear and you don't know whether there will be a company to back the product.
MySQL is very tolerant of ambiguities -- something you don't want in a database system. Here are a few examples off the top of my head:
As another poster stated, CHAR and VARCHAR columns are case-insensitive, already a pretty bad sign.
You can INSERT into a table that has a column without a default value that is also NOT NULL. Yes, really! Instead of throwing an error, MySQL will pick a value for you based on the data type, e.g. 0 for numbers.
You can use a GROUP BY statement while some columns are neither using an aggregate function, nor included in the GROUP BY statement. The outcome is pretty much random. No warnings or errors here either, in my experience.
MySQL is also far from rock-solid. Just this month, I discovered a bug in the (admittedly old, but a "stable release") version of MySQL used by DreamHost that results in data loss. (Certain conditions when creating a table with variable-length rows.)
I've been using MySQL for many years and still do, but would never dream of using it for anything serious, where data loss would be a big problem. It's great for non-mission-critical web sites and blogs though.
I knew I needed to be disillusioned
when I saw someone suggest to someone
to use oracle instead of MySQL and
thought, "Why in the world would you
want to do that?!"
Because your company has been using Oracle for the past ten years, or because you equate enterprise usage with 'must be good' and open-source with 'free crap'. That's just about the only reason. Everyone I know who has worked with Oracle loathes it. Everyone I know who has worked with MySQL, assuming they don't love it, at least consider it a better alternative to Oracle in almost every regard.
SQL RMDBs are so complex though, that in almost every respect there's something one DB does that another doesn't. It is also, unfortunately, a fact of comparing databases that people quote statistics without using properly configured servers. If you have two default configurations for a server, one might be better than the other, but that's about as far as the comparisons usually go. They don't reflect the fact that these gigantic applications have a million little switches and toggles you can use to speed certain things up, increase reliability and generally screw up bad science.
MySQL tends to be a very general purpose database system, you can use it for almost anything that you'd use Oracle, SQL Server, PostgreSQL, DB2, etc for.
However, these different systems have different strengths, PostgreSQL has a ton more functionality than MySQL and can handle some very specific tasks that MySQL struggles with. SQL Server usually integrates with Microsoft products very easily whereas MySQL you'd have to do some extra work to make them play together. Oracle is MASSIVE, they're not just databases and when you're dealing with large, expansive systems Oracle probably has the gear to cover everything under the 1 roof, whereas you'd need to tie a bunch of disparate systems together to have MySQL has your database system.
Whether or not to use MySQL should be based upon whether or not it is reasonable to use MySQL.
Disclaimer: I have been using MySQL since 2001 and still love it, but here are a few reasons that make me doubt about my fidelity...
There are some false arguments (it was true a few years ago) in some of the answers I read. Before making a choice, check MySQL documentation and its up-to-date list of features. You could be surprised.
Each DB server lack functionalities. This is not a real blocking issue if you do not specifically need them.
For me, the main issues are elsewhere:
The time needed to have a bug fixed and published in a stable release. It is a shame. (For some bugs... it takes years (no kidding)!)
The frequency of stable releases.
But since this year, the new issues are:
The number of increasing branches (Percona, Google, Facebook, etc.).
Sun is unclear with his strategy.
Many MySQL employees left the company.
It's free and open source.
True. But keep in mind that MySQL is, in many cases, not free for commercial use. MySQL and the connectors (the official drivers for various languages), are GPL licensed.
If you use, say, the Connector/.NET to connect to MySQL your code have to be GPL compatible. It's dual licensed though, so you can buy an enterprise version under another license - and I believe they have a (either free or just very cheap) program that lets you license the connectors under a different license.
Everyone I know using MySQL is unaware of this :-)
Basically, there are several choices for a database. Frankly, in today's world, DB choice is less important than it was a few years ago. Here are a few issues to consider.
Most of the current database systems in widespread use such as SQL Server (and SQL Server Express), Oracle, MySQL, SQLLite, etc. are relatively standards compliant and can be used somewhat interchangeably. Some serve different niche markets. For example, SQL Server, MySQL, and Oracle are all good choices for large Enterprise applications. SQLLite is very good for applications which deploy on a client and need a local database with a small footprint and minimal configuration. (In my opinion, Oracle is extremely over-priced, is backed by an arrogant unresponsive company. It would never be my first choice on any project. I would only use it if it was mandated by the client or by necessity.)
A high percentage of top-end developers are using tools such as Hibernate(Java)/NHibernate(.NET) to build their data access layers. Hibernate variants strongly encourage developers to start with development of the object model rather than the database model. The Hibernate application then generates the data model automatically--and even handles data model updates. Hibernate variants can be used with any of the major database vendors. Changing your database choice can be as simple and painless as selecting a different database type in your configuration. On a side note, I should mention that while Hibernate and NHibernate are cross-database-compatible, they do not work on the lowest common denominator. The data access code in these applications is often designed to take advantages of special features within a given database engine. For example NHibernate supports access to the NVarchar(Max) data type in SQL Server which allows for very long strings.
In most applications, issues with database performance do not derive directly from the speed of reads and writes. Most of the issues relate to how the application manages the caching of frequently accessed data. For example, in online blog site, it makes sense to cache blog posts once they have been read so they are not repeatedly fetched from the database. This caching mechanism is almost always primarily handled by the application code rather than database server--though database servers do provide some caching. Hibernate/NHibernate have excellent caching support built in as does Microsoft's ASP.NET and their new MVC framework built on top of ASP.NET.
Enterpise databases (SQL Server, Oracle, MySQL) are best for situations where functionality such as replication, clustering, huge datasets, etc. are required.
I don't like MySQL licence : Firebird and PostgreSQL are better
There is no real hotbackup include in the MySQL by Sun
you can also look here which is interresting link and comment !
MySQL is free, but it takes an expert to maintain. Someone who naturally uses the command prompt and is not afraid to experiment. In some cases, MySQL problems are too complex, and the right people to troubleshoot them may not be available for any amount of money.
SQL Server is priced in the middle range. It can be maintained by "normal people", the kind who go home every day on 17:00 and have a natural disinclination to fifty page HOW-TO's. SQL Sever performs well in most instances but can break down in specific scenarios.
Oracle is the most expensive and requires highly paid operators. If you have the money, Oracle is a "safe" choice, because there's nothing Oracle won't do for money.
Three products, three markets!
A couple of pages listing gotchas (such as this and this) make me want to stay as far away from MySQL as possible. Here's a more neutral comparison of Postgres and MySQL.
As for the open source aspect others mentioned: MySQL is open source and free, only if your application is, too. If it's not, you need a commercial license.
My personal story:
Adding a new index to a table of about 10k rows.
MySQL side
about 30 seconds.
Postgres side
about 1 second.
I've worked with MySQL for years, and SQL Server only over the past year. I don't really see one being any easier or harder to use than the other in most cases. I do wish, however, that MSSQL had some of the features that MySQL possesses (e.g. being able to insert multiple rows on a single INSERT statement).
Also, if you don't have to use RDBMS, checkout redis. It is basically memchached with persistence with asynchronous write through. The performance is not on the same scale with MySQL.
Well... I guess the comparison isn't really fair to MySQL since it's not RDBMS...