MySQL vs PostgreSQL Concerns w/ GIS & Speed - mysql

I'm aware there are a few threads out there addressing this issue, but I'm wondering if anything has changed since those have been published.
I'm looking to build a GIS webapp, and people are all saying that PostgreSQL is the way to go because it supports various things that have to do with mapping better, whereas MySQL's spatial extensions aren't too great.
So PostgreSQL seems like the way to go, but everywhere I go I'm reading that PostgreSQL is terribly slow compared to MySQL, is this still true?
If I want to use GeoDjango with MySQL, will I be able to do most everything?
I'm really stuck between the two, simply because people keep saying PostgreSQL is really slow, but MySQL isn't really great for dealing with GIS stuff.
What's your take SO?

No, postgresql is not slower. This myth is due to people running single threaded sequential benchmarks on myisam vs postgresql. Benchmarks that attempt to model actual usage conditions with many concurrent queries put postgresql on par with or ahead of mysql in performance, especially as you scale up in CPUs/cores.
http://www.randombugs.com/linux/mysql-postgresql-benchmarks.html
http://tweakers.net/reviews/657/5

In my opinion, it's silly to compare MySQL and PostgreSQL in terms of speed if there are variables unknown, such as - what's your budget, what's your target system output and what's your load rate?
Both RDBMSs are great, and they can be scaled. The difference is that MySQL has pluggable engine architecture, allowing it to plug in various engines. Natively, MySQL supports 9 engines if I'm not mistaken but it has a plethora of commercial engines to choose from, along with 2 popular forks (Percona's and MariaDB) that introduce various enhancements, especially for InnoDB storage engine.
Real question is, what does it mean that something is "bad" at GIS "stuff"? What does bad mean? Can't calculate something? Can't store something? I just don't get what you consider bad really.
I doubt you can go wrong by choosing either of the two databases, just beware of false benchmarks claiming one product is faster than another. Set your goal in terms of performance, install both products on your test machine and run them. If both satisfy your performance needs, use the one you feel more comfortable developing with.

Check this topic: GIS: PostGIS/PostgreSQL vs. MySql vs. SQL Server?
PostGIS is much more mature and complete, and competes with Oracle and SQL Server, not MySQL. Sorry.

When it comes to GIS capabilities, have a look at this GIS SE question:
Would PostGIS offer an advantage over MySQL for a produce farm application?
I think that from all that I read here and on GIS SE site, PostgreSQL with PostGIS is a clear winner when it comes to handling spatial data.

Related

Use Hadoop as MySQL storage engine?

besides using Hive, is it a good idea in order to execute ad hoc query on large scale log data on HDFS for SQL programmers?
Is there any similar open-source implementation?
I search the question in 2014 and I found Infinidb and a blog about it. It integrate hadoop and mysql. It provide a native mysql protocol access to data stored hadoop.
I've not reading much about it, while it is questionable to me in compatibility(with existing app for mysql) and performance(compare to well tuned index and data partitioning).
But it might be the easiest solution for high-availability with really large dataset which cannot fit into few disks. (using HDFS build-in replication, no SAN or RAID will be needed)
BTW, the Infinidb website is currently affected by the Heartbleed bug. I wonder if their product secure as the yet patch the heardbleed for more then 5 months.
Technically it should not be that complicated to implement. Some conceptual problem I see with it that performance-wise behavior of the NoSQL engines is fundamentally different from what MySQL engine expect from storage engines. Specifically - they have good random access and not that efficient in the full or range scans. The question is it will be possible to translate all these costs to the optimizer. It is something applicable to any RDBMS engine. Actually many of them has a concept of pluggable storage engines and have different level of flexibility / documentation.
I think, to have such integration efficient we need to be able to push down predicates to the NoSQL engines for the full / range scans. I am not 100% sure that MySQL supports it on the level of storage engine interface.
Another serious problem I see with this approach - the fact that MySQL does not have parallel query, and thereof can not be too good for processing big data.

What database works well with 200+GB of data?

I've been using mysql (with innodb; on Amazon rds) because it's sort of universal default, but it's been ridiculously under-performing, and tweaking it only delays the inevitable.
The data is mostly relatively short (<1kB of bytes each) blobs information about 100Ms of urls. There is (or should be, mysql cannot seem to handle it) very high amount of insert / update / retrieve but few complex queries - not that complex queries wouldn't be useful, but because mysql is so slow that it's far faster to get the data out, process it locally, and cache the results somewhere.
I can keep tweaking mysql and throwing more hardware at it, but it seems increasingly futile.
So what are the options? SQL/relational model/etc. optional - anything will do as long as it's fast, networked, and language-independent.
Have you done any sort of end-to-end profiling of your application and MySQL database? To provide better advice it would also be good to understand what improvements you have tried to implement, and your database structure. You haven't given a lot of information on how your MySQL database is configured either. It provides a lot of options for tuning.
You should pick up a copy of High Performance MySQL if you haven't already to learn more about the product.
There is no point in doing anything until you know what your problem is. NoSQL solutions can offer performance benefits but you have provided little evidence that MySQL is incapable of servicing your needs.
Well "Fast, networked and language-independent" + "few complex queries" brings to mind the various NoSQL solutions. To name a few:
MongoDB
CouchDB
Cassandra
And if that's not fast enough, there are always the wicked fast Redis which is my personal favorite atm. :) It is not a database per se, but it's good enough for most scenarios.
I am sure other people can list more NoSQL databases...
and there is always http://nosql-database.org/ .
Generally speaking, databases in this category is better and faster in your scenario because they have relaxed constraints and thus is easier and faster to insert/update/retrieve frequently. But that requires that you think harder about your data model and it is generally not possible to do SQL-style complex queries directly -- you'll instead write more pre-computed data or use a more denormalized design to account for the lack of complex queries.
But since complex queries is a minor problem in your case, I think NoSQL solutions are ideal for you.
With the data you've given about your application's data and workload, it is almost impossible to determine whether the problem really is MySQL itself or something else. You seem to assume that you can throw any workload to a relational engine and it should handle it. Therefore the suggestions made by other commenters about analyzing the performance more carefully are valid in my opinion. Without more data (transactions / second etc.) any further analysis regarding other suitable engines is also futile.
I'm not sure I agree with the advice to jump ship on traditional databases. It might not be the most efficient tool, but it is the one that is FAR more widely understood and used, and a strongly doubt you have a problem that can't be handled by an efficiently set up relational database.
Obvious answers are Oracle, SQLServer, etc, but it might just be your database structure isn't right. I don't know much about MySQL but I do know it's used in some pretty big projects (eBay being noteworthy).

any formal benchmarking of Open source Database software?

Is there any formal performance and stress test reports of open source database, specially sqlite,MySQL an PgSQL?
I want to use sqlite in server for its simple structure and easy embeddable capability. But I can not find any pros and cons (by Googling and Yahoo!ing) regarding performance of these database software.
Please suggest.
I found this article. It has a disclaimer at the top about the age of the information. However, it may be some help to you.
Here is another article that seems a little more recent and up2date.
Seems from reading these that SQLite is quite adequate in terms of performance.
Sysbench is a great utility for benchmarking mysql and I believe has plugins or the capability to test PostgreSQL. Keep in mind that you're not going to get a simple number that says "DBMS A is faster than DBMS B" -- at best you can hope to get an idea of what kind of scaling you'll get for a particular type of workload that is hopefully similar to whatever workload you'll end up throwing at your system.
Regardless of performance, if you really know what you are doing with RDBMS software and need an open source solution, you'll probably want to go with PostgreSQL -- otherwise, stick with MySQL.
Benchmark is not the most important in database choice.
I think SQLite and MySQL are quicker than Progres or Firebird but if you need some specific features like CTEs, only few database have even if it is SQL Standard
Benchmarking is hard. And expensive. And in the installations most of them are done, SQLite won't even be tested, because it's designed for completely different workloads and simply doesn't deal with the situation. (For example, any real benchmark will have clients running on different machines from the server, which SQLite AFAIK doesn't really do - whereas it does do very well in the case where you have a single client locally).
You can always look at something like spec, for example http://www.spec.org/jAppServer2004/results/jAppServer2004.html that shows both pg and mysql at least. But beware that the hardware platforms are different (and that these tests are also not from today).
But the bottom line is that if you want to compare performance for your application, the only really relevant benchmark you can run is your own application in a testing environment.

Where to find a good reference when choosing a database?

I and two others are working on a project at the university.
In the project we are making a prototype of a MMORPG.
We have decided to use PostgreSQL as our database. The other databases we considered were MS SQL-server and MySQL.
Does somebody have a good reference which will justify our choice? (preferably written during the last year)
Someone recently recommended me wikivs.com: MySQL vs. PostgreSQL - it is a quite detailed comparison of those two, and might be of help to you.
the most mentioned difference between MySQL and PostgreSQL is about your reading/writing ratios. If you read a lot more than you write, MySQL is usually faster; but if you do a lot of heavy updates to a table, as often as other threads have to read, then the default locking in MySQL is not the best, and PostgreSQL can be a better choice, performance-wise.
IOW, PostgreSQL scales better regarding to DB writes.
that's why it's usually said that MySQL is best for webapps, while PostgreSQL is more 'enterprisey'.
Of course, the picture is not so simple:
InnoDB tables on MySQL have a very different performance behaviour
At the load levels where PostgreSQL's better locks overtake MySQL's, other parts of your platform could be the bottlenecks.
PostgreSQL does comply better with standards, so it can be easier to replace later.
in the end, the choice has so many variables that no matter which way you go, you'll find some important issue that makes it the right choice.
Go with something that someone in your team has actual experience of using in production. All databases have issues which frequent users are aware of.
I cannot stress enough that someone in the team needs PRODUCTION experience of using it. Not using it for their homework, or to keep their list of CDs in.
All of these databases have their advantages and disadvantages. Which is better is dependent on:
Your teams experience
Your exact requirements
Your current environemnt e.g. whats your app written in and going to be hosted on?
SQL servers main problem is the cost unless you use express edition which has performance limitations however its very easy to use and has a number of good tools.
There is a comparison of the different sql versions at:
http://www.microsoft.com/sql/prodinfo/features/compare-features.mspx
You could then compare these with MySQL and PostGre.
If the purpose of this comparison is a theoretical one for your essay then you can reference web pages such as the microsoft link and compare performance, cost etc.
Postgresql has a page of case studies that you can quote and link to.
Really, any of the above would have worked for you. I personally like PostgreSQL. One solid advantage it has over MSSQL (even assuming you can get it for "free") is that PostgreSQL is non-proprietary. If you're going to introduce a dependency into your project (and re-inventing an RDBMS would be crazy), you don't want it to be a black box.

MySQL vs PostgreSQL? Which should I choose for my Django project?

My Django project is going to be backed by a large database with several hundred thousand entries, and will need to support searching (I'll probably end up using djangosearch or a similar project.)
Which database backend is best suited to my project and why? Can you recommend any good resources for further reading?
For whatever it's worth the the creators of Django recommend PostgreSQL.
If you're not tied to any legacy
system and have the freedom to choose
a database back-end, we recommend
PostgreSQL, which achives a fine
balance between cost, features, speed
and stability. (The Definitive Guide to Django, p. 15)
As someone who recently switched a project from MySQL to Postgresql I don't regret the switch.
The main difference, from a Django point of view, is more rigorous constraint checking in Postgresql, which is a good thing, and also it's a bit more tedious to do manual schema changes (aka migrations).
There are probably 6 or so Django database migration applications out there and at least one doesn't support Postgresql. I don't consider this a disadvantage though because you can use one of the others or do them manually (which is what I prefer atm).
Full text search might be better supported for MySQL. MySQL has built-in full text search supported from within Django but it's pretty useless (no word stemming, phrase searching, etc.). I've used django-sphinx as a better option for full text searching in MySQL.
Full text searching is built-in with Postgresql 8.3 (earlier versions need TSearch module). Here's a good instructional blog post: Full-text searching in Django with PostgreSQL and tsearch2
large database with several hundred
thousand entries,
This is not large database, it's very small one.
I'd choose PostgreSQL, because it has a lot more features. Most significant it this case: in PostgreSQL you can use Python as procedural language.
Go with whichever you're more familiar with. MySQL vs PostgreSQL is an endless war. Both of them are excellent database engines and both are being used by major sites. It really doesn't matter in practice.
All the answers bring interesting information to the table, but some are a little outdated, so here's my grain of salt.
As of 1.7, migrations are now an integral feature of Django. So they documented the main differences that Django developers might want to know beforehand.
Backend Support
Migrations are supported on all backends that Django ships with, as
well as any third-party backends if they have programmed in support
for schema alteration (done via the SchemaEditor class).
However, some databases are more capable than others when it comes to schema migrations; some of the caveats are covered below.
PostgreSQL
PostgreSQL is the most capable of all the databases here in terms of schema support.
MySQL
MySQL lacks support for transactions around schema alteration operations, meaning that if a migration fails to apply you will have to manually unpick the changes in order to try again (it’s impossible to roll back to an earlier point).
In addition, MySQL will fully rewrite tables for almost every schema operation and generally takes a time proportional to the number of rows in the table to add or remove columns. On slower hardware this can be worse than a minute per million rows - adding a few columns to a table with just a few million rows could lock your site up for over ten minutes.
Finally, MySQL has relatively small limits on name lengths for columns, tables and indexes, as well as a limit on the combined size of all columns an index covers. This means that indexes that are possible on other backends will fail to be created under MySQL.
SQLite
SQLite has very little built-in schema alteration support, and so
Django attempts to emulate it by:
Creating a new table with the new schema
Copying the data across
Dropping the old table
Renaming the new table to match the original name
This process generally works well, but it can be slow and occasionally
buggy. It is not recommended that you run and migrate SQLite in a
production environment unless you are very aware of the risks and its
limitations; the support Django ships with is designed to allow
developers to use SQLite on their local machines to develop less
complex Django projects without the need for a full database.
Even if Postgresql looks better, I find it has some performances issues with Django:
Postgresql is made to handle "long connections" (connection pooling, persistant connections, etc.)
MySQL is made to handle "short connections" (connect, do your queries, disconnect, has some performances issues with a lot of open connections)
The problem is that Django does not support connection pooling or persistant connection, it has to connect/disconnect to the database at each view call.
It will works with Postgresql, but connecting to a Postgresql cost a LOT more than connecting to a MySQL database (On Postgresql, each connection has it own process, it's a lot slower than just popping a new thread in MySQL).
Then you get some features like the Query Cache that can be really useful on some cases. (But you lost the superb text search of PostgreSQL)
When a migration fails in django-south, the developers encourage you not to use MySQL:
! The South developers regret this has happened, and would
! like to gently persuade you to consider a slightly
! easier-to-deal-with DBMS (one that supports DDL transactions)
Having gone down the road of MySQL because I was familiar with it (and struggling to find a proper installer and a quick test of the slow web "workbench" interface of postgreSQL put me off), at the end of the project, after a few months after deployment, while looking into back up options, I see you have to pay for MySQL's enterprise back up features. Gotcha right at the very end.
With MySql I had to write some ugly monster raw SQL queries in Django because no select distinct per group for retrieving the latest per group query. Also looking at postgreSQL's full-text search and wishing I had used postgresSQL.
I recommend PostgreSQL even if you are familiar with MySQL, but your mileage may vary.
UPDATE: DBeaver is a great equivalent of MySql Workbench gui tool but works with PostgreSQL very nicely (and many others as its a universal DB tool).
To add to previous answers :
"Full text search might be better supported for MySQL"
The FULLTEXT index in MySQL is a joke.
It only works with MyISAM tables, so you lose ACID, Transactions, Constraints, Relations, Durability, Concurrency, etc.
INSERT/UPDATE/DELETE to a largish TEXT column (like a forum post) will a rebuild a large part of the index. If it does not fit in myisam_key_buffer, then large IO will occur. I've seen a single forum post insertion trigger 100MB or more of IO ... meanwhile the posts table is exclusiely locked !
I did some benchmarking (3 years ago, may be stale...) which showed that on large datasets, basically postgres fulltext is 10-100x faster than mysql, and Xapian 10-100x faster than postgres (but not integrated).
Other reasons not mentioned are the extremely smart query optimizer, large choice of join types (merge, hash, etc), hash aggregation, gist indexes on arrays, spatial search, etc which can result in extremely fast plans on very complicated queries.
Will this application be hosted on your own servers or by a hosting company? Make sure that if you are using a hosting company, they support the database of choice.
There is a major licensing difference between the two db that will affect you if you ever intend to distribute code using the db. MySQL's client libraries are GPL and PostegreSQL's is under a BSD like license which might be easier to work with.