GIS: PostGIS/PostgreSQL vs. MySql vs. SQL Server? [closed] - mysql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
EDIT: I have been using Postgres with PostGIS for a few months now, and I am satisfied.
I need to analyze a few million geocoded records, each of which will have latitude and longitude. These records include data of at least three different types, and I will be trying to see if each set influences the other.
What database is best for the underlying data store for all this data? Here's my desires:
I'm familiar with the DBMS. I'm weakest with PostgreSQL, but I am willing to learn if everything else checks out.
It does well with GIS queries. Google searches suggest that PostgreSQL + PostGIS may be the strongest? At least a lot of products seem to use it. MySql's Spatial Extensions seem comparatively minimal?
Low cost. Despite the 10GB DB limit in SQL Server Express 2008 R2, I'm not sure I want to live with this and other limitations of the free version.
Not antagonistic with Microsoft .NET Framework. Thanks to Connector/Net 6.3.4, MySql works well C# and .NET Framework 4 programs. It fully supports .NET 4's Entity Framework. I cannot find any noncommercial PostgreSQL equivalent, although I'm not opposed to paying $180 for Devart's dotConnect for PostgreSQL Professional Edition.
Compatible with R. It appears all 3 of these can talk with R using ODBC, so may not be an issue.
I've already done some development using MySql, but I can change if necessary.

I have worked with all three databases and done migrations between them, so hopefully I can still add something to an old post. Ten years ago I was tasked with putting a largish -- 450 million spatial objects -- dataset from GML to a spatial database. I decided to try out MySQL and Postgis, at the time there was no spatial in SQL Server and we had a small startup atmosphere, so MySQL seemed a good fit. I subsequently was involved in MySQL, I attended/spoke at a couple of conferences and was heavily involved in the beta testing of the more GIS-compliant functions in MySQL that was finally released with version 5.5. I have subsequently been involved with migrating our spatial data to Postgis and our corporate data (with spatial elements) to SQL Server. These are my findings.
MySQL
1). Stability issues. Over the course of 5 years, we had several database corruptions issues, which could only be fixed by running myismachk on the index file, a process than can take well over 24 hours on a 450 million row table.
2). Until recently only MyISAM tables supported the spatial data type. This means if you want transaction support you are out of luck. InnoDB table type does now support spatial types, but not indexes on them, which given the typical sizes of spatial data sets, isn't terribly useful. See http://dev.mysql.com/doc/refman/5.0/en/innodb-restrictions.html My experience from going to conferences was that spatial was very much an afterthought -- we've implemented replication, partitioning, etc, but it doesn't work with spatial.
EDIT: In the upcoming 5.7.5 release InnoDB will finally support indexes on spatial columns, meaning that ACID, foreign keys and spatial indexes will finally be available in the same engine.
3). The spatial functionality is extremely limited compared to both Postgis and SQL Server spatial. There is still no ST_Union function that acts on an entire geometry field, one of the queries I run most often, ie, you can't write:
select attribute, ST_Union(geom) from some_table group by some_attribute
which is very useful in a GIS context. Select ST_Union(geom1, const_geom) from some_table, ie, one of the geometries is a hard-coded constant geometry is a bit limiting in comparison.
4). No support for rasters. Being able to do combined vector-raster analysis within a db is very useful GIS functionality.
5). No support for conversion from one spatial reference system to another.
6). Since acquisistion by Oracle, spatial has really been put on hold.
Overall, to be fair to MySQL it supported our website, WMS and general spatial processing for several years, and was easy to set up. On the downside, data corruption was an issue, and by being forced to use MyISAM tables you are giving up a lot of the benefits of an RDBMS.
Postgis
Given the issues we had with MySQL, we ultimately converted to Postgis. The key points of this experience have been.
1). Extreme stability. No data corruption in 5 years and we now have around 25 Postgres/GIS boxes on centos virtual machines, under varying degrees of load.
2). Rapid pace of development -- raster, topology, 3D support being recent examples of this.
3). Very active community. The Postgis irc channel and mailing list are excellent resources. The Postgis reference manual is also excellent. http://postgis.net/docs/manual-2.0/
4). Plays very well with other applications, under the OSGeo umbrella, such as GeoServer and GDAL.
5). Stored procedures can be written in many languages, apart from the default plpgsql, such as Python or R.
5). Postgres is a very standards compliant, fully featured RDBMS, which aims to stay close to the ANSI standards.
6). Support for window functions and recursive queries -- not in MySQL, but in SQL Server. This has made writing more complex spatial queries cleaner.
SQL Server.
I have only used SQL Server 2008 spatial functionality, and many of the annoyances of that release -- lack of support for conversions from one CRS to another, the need to add your own parameters to spatial indexes -- have now been resolved.
1). As spatial objects in SQL Server are basically CLR objects, the syntax feels backwards. Instead of ST_Area(geom) you write geom.STArea() and this becomes even more obvious when you chain functions together. The dropping of the underscore in function names is merely a minor annoyance.
2). I have had a number of invalid polygons that have been accepted by SQL Server, and the lack of a ST_MakeValid function can make this a bit painful.
3). Windows only. In general, Microsoft products (like ESRI ones) are designed to work very well with each other, but don't always have standard's compliance and interoperability as primary objectives. If you are running a windows only shop, this is not an issue.
UPDATE: having played a bit with SQL Server 2012, I can say that it has been improved significantly. There is now a good geometry validation function, there is good support for the Geography data type, including a FULL GLOBE object, which allows representing objects that occupy more than one hemisphere and support for Compound Curves and Circular Strings which is useful for accurate and compact representations of arcs (and circles) among other things. Transforming coordinates from one CRS to another still needs to be done in 3rd party libraries, though this is not a show stopper in most applications.
I haven't used SQL Server with large enough datasets to compare one on one with Postgis/MySQL, but from what I have seen the functions behave correctly, and while not quite as fully featured as Postgis, it is a huge improvement on MySQL's offerings.
Sorry for such a long answer, I hope some of the pain and joy I have suffered over the years might be of help to someone.

If you are interested in a thorough comparison, I recommend "Cross Compare SQL Server 2008 Spatial, PostgreSQL/PostGIS 1.3-1.4, MySQL 5-6" and/or "Compare SQL Server 2008 R2, Oracle 11G R2, PostgreSQL/PostGIS 1.5 Spatial Features" by Boston GIS.
Considering your points:
I'm familiar with the DBMS: setting up a PostGIS database on Windows is easy, using PgAdmin3 management is straight-forward too
It does well with GIS queries: PostGIS is definitely strongest of the three, only Oracle Spatial would be comparable but is disqualified if you consider its costs
Low cost: +1 for PostGIS for sure
Not antagonistic with Microsoft .NET Framework: You should at least be able to connect via ODBC (see Postgres wiki)
Compatible with R: shouldn't be a problem with any of the three

PostGis definitely. Here's why.
Postgres is far superior to MySQL in performance. Server is more fault tolerant, has out of the box tools for load-balancing, caching and optimization.
PostGIS is becoming a standard in GIS apps.
It's free.

Just an note that MySQL has finally added in proper GIS logic.
http://dev.mysql.com/doc/refman/5.6/en/functions-for-testing-spatial-relations-between-geometric-objects.html
But I can't comment on cost or performance at this stage

PostGIS is best because it is becoming a standard in GIS applications these days and PostGIS is free. It is far superior to MySQL in performance

Related

PostgreSQL VS MySQL while dealing with GeoDjango in Django

There are multiple Tutorials/Questions over the Internet/Youtube/StackOverflow for finding nearyby businesses, given a location, for example (Question on StackOverflow) :
Returning nearby locations in Django
But one thing common in these all is that they all prefers PostgreSQL (instead of MySQL) for Django's Geodjango library
I am building a project as:
Here a user can register as a customer as well as a business (customer's and business's name/address etc (all) fields will be separate, even if its the same user)
This is not how database is, only for rough idea or project
Both customer and business will have their locations stored
Customers can find nearby businesses around him
I was wondering what are the specific advantages of using PostgreSQL over MySQL in context to computing and fetching the location related fields.
(MySQL is a well tested database for years and most of my data is relational, so I was planning to use MySQL or Microsoft SQL Server)
Would there be any processing disadvantages in context to algorithms used to compute nearby businesses if I choose to go with MySQL, how would it make my system slow?
But one thing common in these all is that they all prefers PostgreSQL (instead of MySQL) for Django's Geodjango library
The reason why they suggest using Postgres is that it has better support for spatial data. It's not that MySQL doesn't support spatial data. However, there is a long list of features which Postgres supports and MySQL doesn't. You can look at this page for details. Almost every time MySQL is mentioned on that page, it is to describe a feature that it does not support, but that Postgres does.
(MySQL is a well tested database for years and most of my data is relational, so I was planning to use MySQL or Microsoft SQL Server)
Note that foriegn key constraints are not compatible with MyISAM, which is the only MySQL database engine which supports spatial indexes. So if you pick MySQL, you need to choose between referential integrity and fast spatial lookups.
If you use Postgres, you can have both referential integrity and fast spatial lookups. Postgres is also a quite mature and widely used relational database these days.
Would there be any processing disadvantages in context to algorithms used to compute nearby businesses if I choose to go with MySQL, how would it make my system slow?
It really depends on how many businesses you're searching for. If you pick an engine that does not support spatial indexes, MySQL is forced to do a full table scan, which takes O(N) time. On the other hand, it can do bounding box comparisons to eliminate many geometries quite quickly. I have seen acceptable interactive performance for 100k points, with performance dropping off after that. In contrast, Postgres with a spatial index is fast for any number of points.

Data Base for handle large data

We have started a new project using MySQL, spring boot, and Angular js. Initially, we did not realize our DB is going to handle large data.
The number of tables will not be large (<130), only 10 to 20 tables will be contained in more data, which is almost inserted/ read/ update.
The estimated amount of data in that 10 table is going to grow at 12,00,000 records in a month, and we should not delete those data be able to do various reports.
There needs to be (read-only) replicated database as a backup/failover, and maybe for offloading reports in peak time.
I don't have first-hand experience with that large databases, so I'm asking the ones that have which DB is the best choice in this situation. as we have completed 100% coding and development but now we realize this. I have doubts may be MYSQL going to handle large data. I know that Oracle is the safe bet, interested if Mysql with a similar setup. But it is bound only in MySQL I am ok with any DB based on you all feedback I can take a call.
Open source DB more preferable but it's not mandatory we can go for paid DB also.
Handling Large Data
MySQL is more than capable of handling such loads. In fact, it is capable of handling much much more load than what you are talking about. You just have to create the right kind of tables. You can do that by choosing
the correct storage engine for your use-case
the correct character set
the optimal data type for your column
the right indexing strategy - creating indexes thoughtfully
the right partitioning strategy (if the data in the table exceeds tens of millions of records)
EDIT: You've also got to choose the right kind of data modelling and normalization strategy for your use-case. Most of OLTP applications require some level of normalization. But if you want to do analytics and aggregates on heavy tables, you should either have a Data Warehouse of have highly denormalized tables to avoid joins and/or have a column-oriented database to support such queries.
MySQL is open-source and has a very strong community support so you will find a lot of literature around any issue that you face. You can also find all the filed bugs (resolved and unresolved) here.
As far as the number of tables are concerned, there's really no cap on that. See here, MySQL permits 4 billion tables if you're using InnoDB as the engine.
A lot of very big companies with scale use MySQL in some capacity. Facebook is one of them.
Native JSON Support
With the growing popularity of JSON as the de facto data exchange format across the internet, MySQL has also provided native JSON support in 5.7, so now you can store and query JSON from your APIs, if required.
HA and Replication
MySQL Replication works! Earlier, MySQL used to support coordinate replication only but now it supports GTID replication which makes it easier to maintain and fix replication issues. There are third-party replicators also available in the market. For instance, Continuent's Tungsten is a replicator written in Java and is a replacement for native replication. It comes with a lot of configuration options which are not available with native MySQL replication.
I agree with MontyPython, MySql can do it and the design is critical. Fortunately MySql allows you to be flexible over time as needed.
I've had history tables needed used in daily reporting that grew to over a billion records in plain MySql and had no problems.
I've also used MySql Merge tables to divide up tables with big-ish rows (100KB+) to speed things up. Basically keeping the individual merge table file sizes under 30GB each. However that solution increases the open file count (in the system) per client - might be a bigger deal on a clustered system. That one was not.
That said, I like to give Honorable Mention to:
MariaDB - MySql but with contributions from Facebook, Alibaba, Google, and more.
I've moved most of my MySql community edition projects over to MariaDB and have been very happy. It's an almost transparent upgrade.
They offer an interesting enterprise Big Data Analytics (MariaDB AX) package, but with your current requirements its probably overkill and the standard community edition will fulfill your needs.
For example, here's an informative tutorial on how to set up a scalable Cluster (Galera) and adding MaxScale for High Availability:
https://mariadb.com/resources/blog/getting-started-mariadb-galera-and-mariadb-maxscale-centos
Another interesting option is Vitesse - developed at Youtube, which allows for sharded mysql through a (mostly) driver based solution. It solves the problem of needing to have available access to huge amounts of data and always yield good performance. As such, it goes beyond high availability and focuses on a solution wherein no single query (ie. a report against millions of rows of historical data) can negatively impact the other queries needing to be performed.

MySQL vs PostgreSQL Concerns w/ GIS & Speed

I'm aware there are a few threads out there addressing this issue, but I'm wondering if anything has changed since those have been published.
I'm looking to build a GIS webapp, and people are all saying that PostgreSQL is the way to go because it supports various things that have to do with mapping better, whereas MySQL's spatial extensions aren't too great.
So PostgreSQL seems like the way to go, but everywhere I go I'm reading that PostgreSQL is terribly slow compared to MySQL, is this still true?
If I want to use GeoDjango with MySQL, will I be able to do most everything?
I'm really stuck between the two, simply because people keep saying PostgreSQL is really slow, but MySQL isn't really great for dealing with GIS stuff.
What's your take SO?
No, postgresql is not slower. This myth is due to people running single threaded sequential benchmarks on myisam vs postgresql. Benchmarks that attempt to model actual usage conditions with many concurrent queries put postgresql on par with or ahead of mysql in performance, especially as you scale up in CPUs/cores.
http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html
http://tweakers.net/reviews/657/5
In my opinion, it's silly to compare MySQL and PostgreSQL in terms of speed if there are variables unknown, such as - what's your budget, what's your target system output and what's your load rate?
Both RDBMSs are great, and they can be scaled. The difference is that MySQL has pluggable engine architecture, allowing it to plug in various engines. Natively, MySQL supports 9 engines if I'm not mistaken but it has a plethora of commercial engines to choose from, along with 2 popular forks (Percona's and MariaDB) that introduce various enhancements, especially for InnoDB storage engine.
Real question is, what does it mean that something is "bad" at GIS "stuff"? What does bad mean? Can't calculate something? Can't store something? I just don't get what you consider bad really.
I doubt you can go wrong by choosing either of the two databases, just beware of false benchmarks claiming one product is faster than another. Set your goal in terms of performance, install both products on your test machine and run them. If both satisfy your performance needs, use the one you feel more comfortable developing with.
Check this topic: GIS: PostGIS/PostgreSQL vs. MySql vs. SQL Server?
PostGIS is much more mature and complete, and competes with Oracle and SQL Server, not MySQL. Sorry.
When it comes to GIS capabilities, have a look at this GIS SE question:
Would PostGIS offer an advantage over MySQL for a produce farm application?
I think that from all that I read here and on GIS SE site, PostgreSQL with PostGIS is a clear winner when it comes to handling spatial data.

Disadvantages of MySQL versus other databases

Every single book that teaches programming (or almost anything else) starts off with a whole bunch of spiel on why what it's about (C++, MySQL, waterskiing, skydiving, dentistry, whatever) is the greatest thing in the world. So I open the MySQL O'Reilly book, and read the intro, and get the traditional sermon. The main points that the book mentioned were:
MySQL has been shown to have tied Oracle as the fastest and most scalable database software.
It's free and open source.
Sounds pretty convincing, but I know there's always at least two sides of every story. I knew I needed to be disillusioned when I saw someone suggest to someone to use Oracle instead of MySQL and thought, "Why in the world would you want to do that?!", just because of the few paragraphs I'd read, with no other justification. So lets investigate the other side of the story:
What are some reasons NOT to use MySQL?
Here's just a random list of stuff that popped into my head. It's CW, so feel free to add to it as necessary.
Oracle provides a top notch ERP built on their database. If your company is subject to Sarbanes-Oxley regulations, this is quite a bit above "crucial."
SQL Server licenses come with Analysis Services, Integration Services, and Reporting Services. If you want to do anything with OLAP, ETL, or reporting, these three are great applications that are built on the SQL Server stack.
SQL Server has native .NET data types (in 2008). Absolutely brilliant for .NET shops dealing with geospatial datasets.
MySQL does not support check constraints.
SQL Server includes the over clause, which helps when dealing with the "top n rows in each group" problem. Essentially, you can do aggregate functions partitioned over the dataset any way you'd like.
SQL Server uses Kerberos and Windows authentication natively. MySQL does not tie into Active Directory.
Superior performance on subqueries (almost any database has subquery performance that is superior to MySQL's)
Oracle, SQL Server, PostgreSQL and others have a richer set of join algorithms available to them; this means joins can often be performed faster, especially when large tables are involved.
MySQL has been shown to have tied oracle as the fastest and most scalable database software.
Making that statement about any two database systems is probably enough to throw the book away without reading the rest. Database systems are not commodities that can be compared with a couple lines of information, and will not be for the foreseeable future.
One reason that the statement is obviously false is that MySQL has very limited plan choices available. For instance, MySQL can't use merge join or hash join -- two fundamental algorithms that have useful performance characteristics. That's pretty much the end of the story for many query workloads. It is trivial to show a reasonable query that is orders of magnitude faster with a merge join.
There are plenty of other criticisms of MySQL versus XYZ and vice-versa. My point is that this is a complex issue, and the book is drastically oversimplifying. If you're getting involved in databases at all, you need to spend time diversifying your knowledge and understanding fundamentals.
My personal opinion is that MySQL and SQLite are the worst places to start. Pick something like Oracle (which can be downloaded free of charge for learning/evaluation, which many don't realize), PostgreSQL (BSD license), or MS SQL. FirebirdSQL might be good, too. Once you familiarize yourself with a few systems, you'll be able to make an informed choice about whether the trade-offs MySQL makes are right for you.
Everyone seems to be missing one of the main reasons to stick with Oracle/MS. You've already got a stable full of DBAs that know those products inside and out.
The default collation in mysql is case-insensitive. This is not a problem per se, but I think this strange default is an indication that it was targeted at hobby-developers, rather than professionals. This is a big assumption, but I'd think any professional would expect a database to compare strings for identity by default (i.e. using a binary collation).
Manipulation of tables during transactions causes implicit COMMITs. While this might not look grieve at the first glance, you will notice that you cannot cannot work under ACID conditions if altering/creating tables is an inherent part of your application.
MySQL can certainly match or beat Oracle in speed. I've done it numerous times myself. Ok, so I had to use various table types like black hole, merge, innodb, and myisam in just the right laces. And it took me a few days to get everything working just right. The Oracle DBA got things working in an hour or two.
MySQL is fine for 98% of the sites out there, maybe more. But it is fairly easy to bring it to a crawl without a lot of data if you don't know what you are doing. Oracle is quite a bit harder to bring to a crawl, but it can still be done. I've worked with both with datasets in the hundreds of millions of records (tiny by some measures). MySQL takes quite a bit more attention.
No database can scale indefinitely, which is why nosql "databases" are becoming so popular. I think the real question is if MySQL is "good enough" for what you need to do. The price is certainly right. The same could be said about PHP.
Why does Facebook use MySQL? Could you imagine what it would cost them to buy enough Oracle licenses!? It's good enough.
The future is of sun (the company behind mysql) is unclear and you don't know whether there will be a company to back the product.
MySQL is very tolerant of ambiguities -- something you don't want in a database system. Here are a few examples off the top of my head:
As another poster stated, CHAR and VARCHAR columns are case-insensitive, already a pretty bad sign.
You can INSERT into a table that has a column without a default value that is also NOT NULL. Yes, really! Instead of throwing an error, MySQL will pick a value for you based on the data type, e.g. 0 for numbers.
You can use a GROUP BY statement while some columns are neither using an aggregate function, nor included in the GROUP BY statement. The outcome is pretty much random. No warnings or errors here either, in my experience.
MySQL is also far from rock-solid. Just this month, I discovered a bug in the (admittedly old, but a "stable release") version of MySQL used by DreamHost that results in data loss. (Certain conditions when creating a table with variable-length rows.)
I've been using MySQL for many years and still do, but would never dream of using it for anything serious, where data loss would be a big problem. It's great for non-mission-critical web sites and blogs though.
I knew I needed to be disillusioned
when I saw someone suggest to someone
to use oracle instead of MySQL and
thought, "Why in the world would you
want to do that?!"
Because your company has been using Oracle for the past ten years, or because you equate enterprise usage with 'must be good' and open-source with 'free crap'. That's just about the only reason. Everyone I know who has worked with Oracle loathes it. Everyone I know who has worked with MySQL, assuming they don't love it, at least consider it a better alternative to Oracle in almost every regard.
SQL RMDBs are so complex though, that in almost every respect there's something one DB does that another doesn't. It is also, unfortunately, a fact of comparing databases that people quote statistics without using properly configured servers. If you have two default configurations for a server, one might be better than the other, but that's about as far as the comparisons usually go. They don't reflect the fact that these gigantic applications have a million little switches and toggles you can use to speed certain things up, increase reliability and generally screw up bad science.
MySQL tends to be a very general purpose database system, you can use it for almost anything that you'd use Oracle, SQL Server, PostgreSQL, DB2, etc for.
However, these different systems have different strengths, PostgreSQL has a ton more functionality than MySQL and can handle some very specific tasks that MySQL struggles with. SQL Server usually integrates with Microsoft products very easily whereas MySQL you'd have to do some extra work to make them play together. Oracle is MASSIVE, they're not just databases and when you're dealing with large, expansive systems Oracle probably has the gear to cover everything under the 1 roof, whereas you'd need to tie a bunch of disparate systems together to have MySQL has your database system.
Whether or not to use MySQL should be based upon whether or not it is reasonable to use MySQL.
Disclaimer: I have been using MySQL since 2001 and still love it, but here are a few reasons that make me doubt about my fidelity...
There are some false arguments (it was true a few years ago) in some of the answers I read. Before making a choice, check MySQL documentation and its up-to-date list of features. You could be surprised.
Each DB server lack functionalities. This is not a real blocking issue if you do not specifically need them.
For me, the main issues are elsewhere:
The time needed to have a bug fixed and published in a stable release. It is a shame. (For some bugs... it takes years (no kidding)!)
The frequency of stable releases.
But since this year, the new issues are:
The number of increasing branches (Percona, Google, Facebook, etc.).
Sun is unclear with his strategy.
Many MySQL employees left the company.
It's free and open source.
True. But keep in mind that MySQL is, in many cases, not free for commercial use. MySQL and the connectors (the official drivers for various languages), are GPL licensed.
If you use, say, the Connector/.NET to connect to MySQL your code have to be GPL compatible. It's dual licensed though, so you can buy an enterprise version under another license - and I believe they have a (either free or just very cheap) program that lets you license the connectors under a different license.
Everyone I know using MySQL is unaware of this :-)
Basically, there are several choices for a database. Frankly, in today's world, DB choice is less important than it was a few years ago. Here are a few issues to consider.
Most of the current database systems in widespread use such as SQL Server (and SQL Server Express), Oracle, MySQL, SQLLite, etc. are relatively standards compliant and can be used somewhat interchangeably. Some serve different niche markets. For example, SQL Server, MySQL, and Oracle are all good choices for large Enterprise applications. SQLLite is very good for applications which deploy on a client and need a local database with a small footprint and minimal configuration. (In my opinion, Oracle is extremely over-priced, is backed by an arrogant unresponsive company. It would never be my first choice on any project. I would only use it if it was mandated by the client or by necessity.)
A high percentage of top-end developers are using tools such as Hibernate(Java)/NHibernate(.NET) to build their data access layers. Hibernate variants strongly encourage developers to start with development of the object model rather than the database model. The Hibernate application then generates the data model automatically--and even handles data model updates. Hibernate variants can be used with any of the major database vendors. Changing your database choice can be as simple and painless as selecting a different database type in your configuration. On a side note, I should mention that while Hibernate and NHibernate are cross-database-compatible, they do not work on the lowest common denominator. The data access code in these applications is often designed to take advantages of special features within a given database engine. For example NHibernate supports access to the NVarchar(Max) data type in SQL Server which allows for very long strings.
In most applications, issues with database performance do not derive directly from the speed of reads and writes. Most of the issues relate to how the application manages the caching of frequently accessed data. For example, in online blog site, it makes sense to cache blog posts once they have been read so they are not repeatedly fetched from the database. This caching mechanism is almost always primarily handled by the application code rather than database server--though database servers do provide some caching. Hibernate/NHibernate have excellent caching support built in as does Microsoft's ASP.NET and their new MVC framework built on top of ASP.NET.
Enterpise databases (SQL Server, Oracle, MySQL) are best for situations where functionality such as replication, clustering, huge datasets, etc. are required.
I don't like MySQL licence : Firebird and PostgreSQL are better
There is no real hotbackup include in the MySQL by Sun
you can also look here which is interresting link and comment !
MySQL is free, but it takes an expert to maintain. Someone who naturally uses the command prompt and is not afraid to experiment. In some cases, MySQL problems are too complex, and the right people to troubleshoot them may not be available for any amount of money.
SQL Server is priced in the middle range. It can be maintained by "normal people", the kind who go home every day on 17:00 and have a natural disinclination to fifty page HOW-TO's. SQL Sever performs well in most instances but can break down in specific scenarios.
Oracle is the most expensive and requires highly paid operators. If you have the money, Oracle is a "safe" choice, because there's nothing Oracle won't do for money.
Three products, three markets!
A couple of pages listing gotchas (such as this and this) make me want to stay as far away from MySQL as possible. Here's a more neutral comparison of Postgres and MySQL.
As for the open source aspect others mentioned: MySQL is open source and free, only if your application is, too. If it's not, you need a commercial license.
My personal story:
Adding a new index to a table of about 10k rows.
MySQL side
about 30 seconds.
Postgres side
about 1 second.
I've worked with MySQL for years, and SQL Server only over the past year. I don't really see one being any easier or harder to use than the other in most cases. I do wish, however, that MSSQL had some of the features that MySQL possesses (e.g. being able to insert multiple rows on a single INSERT statement).
Also, if you don't have to use RDBMS, checkout redis. It is basically memchached with persistence with asynchronous write through. The performance is not on the same scale with MySQL.
Well... I guess the comparison isn't really fair to MySQL since it's not RDBMS...

MySQL vs PostgreSQL? Which should I choose for my Django project?

My Django project is going to be backed by a large database with several hundred thousand entries, and will need to support searching (I'll probably end up using djangosearch or a similar project.)
Which database backend is best suited to my project and why? Can you recommend any good resources for further reading?
For whatever it's worth the the creators of Django recommend PostgreSQL.
If you're not tied to any legacy
system and have the freedom to choose
a database back-end, we recommend
PostgreSQL, which achives a fine
balance between cost, features, speed
and stability. (The Definitive Guide to Django, p. 15)
As someone who recently switched a project from MySQL to Postgresql I don't regret the switch.
The main difference, from a Django point of view, is more rigorous constraint checking in Postgresql, which is a good thing, and also it's a bit more tedious to do manual schema changes (aka migrations).
There are probably 6 or so Django database migration applications out there and at least one doesn't support Postgresql. I don't consider this a disadvantage though because you can use one of the others or do them manually (which is what I prefer atm).
Full text search might be better supported for MySQL. MySQL has built-in full text search supported from within Django but it's pretty useless (no word stemming, phrase searching, etc.). I've used django-sphinx as a better option for full text searching in MySQL.
Full text searching is built-in with Postgresql 8.3 (earlier versions need TSearch module). Here's a good instructional blog post: Full-text searching in Django with PostgreSQL and tsearch2
large database with several hundred
thousand entries,
This is not large database, it's very small one.
I'd choose PostgreSQL, because it has a lot more features. Most significant it this case: in PostgreSQL you can use Python as procedural language.
Go with whichever you're more familiar with. MySQL vs PostgreSQL is an endless war. Both of them are excellent database engines and both are being used by major sites. It really doesn't matter in practice.
All the answers bring interesting information to the table, but some are a little outdated, so here's my grain of salt.
As of 1.7, migrations are now an integral feature of Django. So they documented the main differences that Django developers might want to know beforehand.
Backend Support
Migrations are supported on all backends that Django ships with, as
well as any third-party backends if they have programmed in support
for schema alteration (done via the SchemaEditor class).
However, some databases are more capable than others when it comes to schema migrations; some of the caveats are covered below.
PostgreSQL
PostgreSQL is the most capable of all the databases here in terms of schema support.
MySQL
MySQL lacks support for transactions around schema alteration operations, meaning that if a migration fails to apply you will have to manually unpick the changes in order to try again (it’s impossible to roll back to an earlier point).
In addition, MySQL will fully rewrite tables for almost every schema operation and generally takes a time proportional to the number of rows in the table to add or remove columns. On slower hardware this can be worse than a minute per million rows - adding a few columns to a table with just a few million rows could lock your site up for over ten minutes.
Finally, MySQL has relatively small limits on name lengths for columns, tables and indexes, as well as a limit on the combined size of all columns an index covers. This means that indexes that are possible on other backends will fail to be created under MySQL.
SQLite
SQLite has very little built-in schema alteration support, and so
Django attempts to emulate it by:
Creating a new table with the new schema
Copying the data across
Dropping the old table
Renaming the new table to match the original name
This process generally works well, but it can be slow and occasionally
buggy. It is not recommended that you run and migrate SQLite in a
production environment unless you are very aware of the risks and its
limitations; the support Django ships with is designed to allow
developers to use SQLite on their local machines to develop less
complex Django projects without the need for a full database.
Even if Postgresql looks better, I find it has some performances issues with Django:
Postgresql is made to handle "long connections" (connection pooling, persistant connections, etc.)
MySQL is made to handle "short connections" (connect, do your queries, disconnect, has some performances issues with a lot of open connections)
The problem is that Django does not support connection pooling or persistant connection, it has to connect/disconnect to the database at each view call.
It will works with Postgresql, but connecting to a Postgresql cost a LOT more than connecting to a MySQL database (On Postgresql, each connection has it own process, it's a lot slower than just popping a new thread in MySQL).
Then you get some features like the Query Cache that can be really useful on some cases. (But you lost the superb text search of PostgreSQL)
When a migration fails in django-south, the developers encourage you not to use MySQL:
! The South developers regret this has happened, and would
! like to gently persuade you to consider a slightly
! easier-to-deal-with DBMS (one that supports DDL transactions)
Having gone down the road of MySQL because I was familiar with it (and struggling to find a proper installer and a quick test of the slow web "workbench" interface of postgreSQL put me off), at the end of the project, after a few months after deployment, while looking into back up options, I see you have to pay for MySQL's enterprise back up features. Gotcha right at the very end.
With MySql I had to write some ugly monster raw SQL queries in Django because no select distinct per group for retrieving the latest per group query. Also looking at postgreSQL's full-text search and wishing I had used postgresSQL.
I recommend PostgreSQL even if you are familiar with MySQL, but your mileage may vary.
UPDATE: DBeaver is a great equivalent of MySql Workbench gui tool but works with PostgreSQL very nicely (and many others as its a universal DB tool).
To add to previous answers :
"Full text search might be better supported for MySQL"
The FULLTEXT index in MySQL is a joke.
It only works with MyISAM tables, so you lose ACID, Transactions, Constraints, Relations, Durability, Concurrency, etc.
INSERT/UPDATE/DELETE to a largish TEXT column (like a forum post) will a rebuild a large part of the index. If it does not fit in myisam_key_buffer, then large IO will occur. I've seen a single forum post insertion trigger 100MB or more of IO ... meanwhile the posts table is exclusiely locked !
I did some benchmarking (3 years ago, may be stale...) which showed that on large datasets, basically postgres fulltext is 10-100x faster than mysql, and Xapian 10-100x faster than postgres (but not integrated).
Other reasons not mentioned are the extremely smart query optimizer, large choice of join types (merge, hash, etc), hash aggregation, gist indexes on arrays, spatial search, etc which can result in extremely fast plans on very complicated queries.
Will this application be hosted on your own servers or by a hosting company? Make sure that if you are using a hosting company, they support the database of choice.
There is a major licensing difference between the two db that will affect you if you ever intend to distribute code using the db. MySQL's client libraries are GPL and PostegreSQL's is under a BSD like license which might be easier to work with.