PostgreSQL VS MySQL while dealing with GeoDjango in Django - mysql

There are multiple Tutorials/Questions over the Internet/Youtube/StackOverflow for finding nearyby businesses, given a location, for example (Question on StackOverflow) :
Returning nearby locations in Django
But one thing common in these all is that they all prefers PostgreSQL (instead of MySQL) for Django's Geodjango library
I am building a project as:
Here a user can register as a customer as well as a business (customer's and business's name/address etc (all) fields will be separate, even if its the same user)
This is not how database is, only for rough idea or project
Both customer and business will have their locations stored
Customers can find nearby businesses around him
I was wondering what are the specific advantages of using PostgreSQL over MySQL in context to computing and fetching the location related fields.
(MySQL is a well tested database for years and most of my data is relational, so I was planning to use MySQL or Microsoft SQL Server)
Would there be any processing disadvantages in context to algorithms used to compute nearby businesses if I choose to go with MySQL, how would it make my system slow?

But one thing common in these all is that they all prefers PostgreSQL (instead of MySQL) for Django's Geodjango library
The reason why they suggest using Postgres is that it has better support for spatial data. It's not that MySQL doesn't support spatial data. However, there is a long list of features which Postgres supports and MySQL doesn't. You can look at this page for details. Almost every time MySQL is mentioned on that page, it is to describe a feature that it does not support, but that Postgres does.
(MySQL is a well tested database for years and most of my data is relational, so I was planning to use MySQL or Microsoft SQL Server)
Note that foriegn key constraints are not compatible with MyISAM, which is the only MySQL database engine which supports spatial indexes. So if you pick MySQL, you need to choose between referential integrity and fast spatial lookups.
If you use Postgres, you can have both referential integrity and fast spatial lookups. Postgres is also a quite mature and widely used relational database these days.
Would there be any processing disadvantages in context to algorithms used to compute nearby businesses if I choose to go with MySQL, how would it make my system slow?
It really depends on how many businesses you're searching for. If you pick an engine that does not support spatial indexes, MySQL is forced to do a full table scan, which takes O(N) time. On the other hand, it can do bounding box comparisons to eliminate many geometries quite quickly. I have seen acceptable interactive performance for 100k points, with performance dropping off after that. In contrast, Postgres with a spatial index is fast for any number of points.

Related

Database for Full Text Search and 200M+ Records

Iam about to create a huge database with at least 200 Million entries.
The database needs to be searchable using full text and should be fast.
My database gets data from many different datasources and i need to import the new or updated data regularly.
Is it a good idea to store all my data in a relational database like mysql and then create a nosql document database (e.g. mongodb or elasticsearch) just for the purpose of searching or does that not provide any benefit in terms of
reliability and the prevention of redundant information?
I believe that keeping primary records in a SQL database and duplicating them to a noSQL database is a very common approach.
ElasticSearch has an ongoing status page about their resiliency. Even in the newest version, ElasticSearch can loose data in a number of different situations. A major change in the structure of an ElasticSearch index (such as adding analyzers) requires that you re-index all of the documents. This process is safer if you have another source for the documents. At the end of the day, ElasticSearch isn't designed to consistently store documents - I would only ever choose to use ElasticSearch as the primary store in situations where occasional data loss isn't a disaster.
Unlike ElasticSearch, MongoDB is designed to be resilient. You should be able to safely store documents in MongoDB. I've found trying to do full text searches in MongoDB can be a little painful, at least compared to ElasticSearch. In my opinion, for text search, the only advantage MongoDB has over MySQL's FULLTEXT is that it is distributed.
We are running ElasticSearch and MySQL right now - and the benefits greatly outweigh the hassles of extra infrastructure and dealing with replication between the two. We had previously attempted to use a noSQL solution as the primary datastore, with disastrous results. Running a ES in conjunction with a MySQL gets you the best of both worlds - consistency & safety of data in SQL, with the scalable, effective full text search in ES.
I don't know how applicable to your situation this is, but Evan Weaver compared a few of the common Rails search options (Sphinx, Ferret and Solr), running some benchmarks.

Solr-ish Query API on top of relational database

I have a data source which is sitting in relational database. I managed to index/store everything into Solr and thrilled to see the search performance and the awesome API (search/admin..etc).
However, people say if your data is truly structured, relational database should be fast if you index everything. However, even if I dump all the data into a relation database like MySQL, what I am missing is all the beautiful query API.
I guess my question is:
is it possible to only use the query API of Solr-ish and totally use relation database as the backend instead of using index at all.
if that is not possible, is there any mature project/product that can build a full stack query API on a relational database?
Document search engines and relational databases serves different usage patterns. If you're using Solr for anything that involves tokenization and analysis chains, replicating that in an RDBMS requires implementing that functionality yourself (or just using a subset, such as full text indices in certain RDBMSes). I detailed some of these differences and features in Should I just query the database or use a proper search engine solution?.
It's usually better to use the RDBMS as the main storage for your data and then push it into the search index as required. This will also let you get new features from those who care about search and the problem it tries to solve, without having to wait for a niche product to implement it on top of your RDBMS (there's still quite a few new features in each iteration of Lucene, Elastic and solr).

percona nosql vs other nosql

I am evaluating nosql stores for storing key/value pairs (for a part of application), and came across percona which offers native key/value within mysql world. It seems a good solution as it allows the storage to remain at a single place (since rest of the functionality exist in mysql and would continue to as-is). Are there any other advantages over other key/value store such as cassandra? What are the disadvantages?
You're referring to the HandlerSocket interface, which bypasses the SQL query layer and allows you to fetch and store rows in a single InnoDB table by primary key. The idea is that avoiding the overhead of SQL allows applications to run a much higher rate of QPS.
HandlerSocket shows promise, but so far what we've found (I work for Percona) is that the bottleneck is the hastily-written client interfaces. That is, the client API for PHP, Ruby, etc. in their current state of implementation have such overhead that HandlerSocket is no faster than writing simple SQL statements for INSERT and SELECT. InnoDB is optimized for primary key access already, since the tables are really stored as clustered indexes by primary key.
Future development on writing optimized code for the HandlerSocket client libraries should improve this over time. If you want to help this process along, get involved in the open-source projects to develop those client libraries.
Another drawback of HandlerSocket is that AFAIK, it doesn't support in-place incrementing of values, which is an optimization some other key/value stores offer. With HandlerSocket, you'd have to fetch the value, read it, increment it, then post it back to the database. This introduces a race condition, so you'd have to lock the row somehow.

GIS: PostGIS/PostgreSQL vs. MySql vs. SQL Server? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
EDIT: I have been using Postgres with PostGIS for a few months now, and I am satisfied.
I need to analyze a few million geocoded records, each of which will have latitude and longitude. These records include data of at least three different types, and I will be trying to see if each set influences the other.
What database is best for the underlying data store for all this data? Here's my desires:
I'm familiar with the DBMS. I'm weakest with PostgreSQL, but I am willing to learn if everything else checks out.
It does well with GIS queries. Google searches suggest that PostgreSQL + PostGIS may be the strongest? At least a lot of products seem to use it. MySql's Spatial Extensions seem comparatively minimal?
Low cost. Despite the 10GB DB limit in SQL Server Express 2008 R2, I'm not sure I want to live with this and other limitations of the free version.
Not antagonistic with Microsoft .NET Framework. Thanks to Connector/Net 6.3.4, MySql works well C# and .NET Framework 4 programs. It fully supports .NET 4's Entity Framework. I cannot find any noncommercial PostgreSQL equivalent, although I'm not opposed to paying $180 for Devart's dotConnect for PostgreSQL Professional Edition.
Compatible with R. It appears all 3 of these can talk with R using ODBC, so may not be an issue.
I've already done some development using MySql, but I can change if necessary.
I have worked with all three databases and done migrations between them, so hopefully I can still add something to an old post. Ten years ago I was tasked with putting a largish -- 450 million spatial objects -- dataset from GML to a spatial database. I decided to try out MySQL and Postgis, at the time there was no spatial in SQL Server and we had a small startup atmosphere, so MySQL seemed a good fit. I subsequently was involved in MySQL, I attended/spoke at a couple of conferences and was heavily involved in the beta testing of the more GIS-compliant functions in MySQL that was finally released with version 5.5. I have subsequently been involved with migrating our spatial data to Postgis and our corporate data (with spatial elements) to SQL Server. These are my findings.
MySQL
1). Stability issues. Over the course of 5 years, we had several database corruptions issues, which could only be fixed by running myismachk on the index file, a process than can take well over 24 hours on a 450 million row table.
2). Until recently only MyISAM tables supported the spatial data type. This means if you want transaction support you are out of luck. InnoDB table type does now support spatial types, but not indexes on them, which given the typical sizes of spatial data sets, isn't terribly useful. See http://dev.mysql.com/doc/refman/5.0/en/innodb-restrictions.html My experience from going to conferences was that spatial was very much an afterthought -- we've implemented replication, partitioning, etc, but it doesn't work with spatial.
EDIT: In the upcoming 5.7.5 release InnoDB will finally support indexes on spatial columns, meaning that ACID, foreign keys and spatial indexes will finally be available in the same engine.
3). The spatial functionality is extremely limited compared to both Postgis and SQL Server spatial. There is still no ST_Union function that acts on an entire geometry field, one of the queries I run most often, ie, you can't write:
select attribute, ST_Union(geom) from some_table group by some_attribute
which is very useful in a GIS context. Select ST_Union(geom1, const_geom) from some_table, ie, one of the geometries is a hard-coded constant geometry is a bit limiting in comparison.
4). No support for rasters. Being able to do combined vector-raster analysis within a db is very useful GIS functionality.
5). No support for conversion from one spatial reference system to another.
6). Since acquisistion by Oracle, spatial has really been put on hold.
Overall, to be fair to MySQL it supported our website, WMS and general spatial processing for several years, and was easy to set up. On the downside, data corruption was an issue, and by being forced to use MyISAM tables you are giving up a lot of the benefits of an RDBMS.
Postgis
Given the issues we had with MySQL, we ultimately converted to Postgis. The key points of this experience have been.
1). Extreme stability. No data corruption in 5 years and we now have around 25 Postgres/GIS boxes on centos virtual machines, under varying degrees of load.
2). Rapid pace of development -- raster, topology, 3D support being recent examples of this.
3). Very active community. The Postgis irc channel and mailing list are excellent resources. The Postgis reference manual is also excellent. http://postgis.net/docs/manual-2.0/
4). Plays very well with other applications, under the OSGeo umbrella, such as GeoServer and GDAL.
5). Stored procedures can be written in many languages, apart from the default plpgsql, such as Python or R.
5). Postgres is a very standards compliant, fully featured RDBMS, which aims to stay close to the ANSI standards.
6). Support for window functions and recursive queries -- not in MySQL, but in SQL Server. This has made writing more complex spatial queries cleaner.
SQL Server.
I have only used SQL Server 2008 spatial functionality, and many of the annoyances of that release -- lack of support for conversions from one CRS to another, the need to add your own parameters to spatial indexes -- have now been resolved.
1). As spatial objects in SQL Server are basically CLR objects, the syntax feels backwards. Instead of ST_Area(geom) you write geom.STArea() and this becomes even more obvious when you chain functions together. The dropping of the underscore in function names is merely a minor annoyance.
2). I have had a number of invalid polygons that have been accepted by SQL Server, and the lack of a ST_MakeValid function can make this a bit painful.
3). Windows only. In general, Microsoft products (like ESRI ones) are designed to work very well with each other, but don't always have standard's compliance and interoperability as primary objectives. If you are running a windows only shop, this is not an issue.
UPDATE: having played a bit with SQL Server 2012, I can say that it has been improved significantly. There is now a good geometry validation function, there is good support for the Geography data type, including a FULL GLOBE object, which allows representing objects that occupy more than one hemisphere and support for Compound Curves and Circular Strings which is useful for accurate and compact representations of arcs (and circles) among other things. Transforming coordinates from one CRS to another still needs to be done in 3rd party libraries, though this is not a show stopper in most applications.
I haven't used SQL Server with large enough datasets to compare one on one with Postgis/MySQL, but from what I have seen the functions behave correctly, and while not quite as fully featured as Postgis, it is a huge improvement on MySQL's offerings.
Sorry for such a long answer, I hope some of the pain and joy I have suffered over the years might be of help to someone.
If you are interested in a thorough comparison, I recommend "Cross Compare SQL Server 2008 Spatial, PostgreSQL/PostGIS 1.3-1.4, MySQL 5-6" and/or "Compare SQL Server 2008 R2, Oracle 11G R2, PostgreSQL/PostGIS 1.5 Spatial Features" by Boston GIS.
Considering your points:
I'm familiar with the DBMS: setting up a PostGIS database on Windows is easy, using PgAdmin3 management is straight-forward too
It does well with GIS queries: PostGIS is definitely strongest of the three, only Oracle Spatial would be comparable but is disqualified if you consider its costs
Low cost: +1 for PostGIS for sure
Not antagonistic with Microsoft .NET Framework: You should at least be able to connect via ODBC (see Postgres wiki)
Compatible with R: shouldn't be a problem with any of the three
PostGis definitely. Here's why.
Postgres is far superior to MySQL in performance. Server is more fault tolerant, has out of the box tools for load-balancing, caching and optimization.
PostGIS is becoming a standard in GIS apps.
It's free.
Just an note that MySQL has finally added in proper GIS logic.
http://dev.mysql.com/doc/refman/5.6/en/functions-for-testing-spatial-relations-between-geometric-objects.html
But I can't comment on cost or performance at this stage
PostGIS is best because it is becoming a standard in GIS applications these days and PostGIS is free. It is far superior to MySQL in performance

MySQL vs PostgreSQL? Which should I choose for my Django project?

My Django project is going to be backed by a large database with several hundred thousand entries, and will need to support searching (I'll probably end up using djangosearch or a similar project.)
Which database backend is best suited to my project and why? Can you recommend any good resources for further reading?
For whatever it's worth the the creators of Django recommend PostgreSQL.
If you're not tied to any legacy
system and have the freedom to choose
a database back-end, we recommend
PostgreSQL, which achives a fine
balance between cost, features, speed
and stability. (The Definitive Guide to Django, p. 15)
As someone who recently switched a project from MySQL to Postgresql I don't regret the switch.
The main difference, from a Django point of view, is more rigorous constraint checking in Postgresql, which is a good thing, and also it's a bit more tedious to do manual schema changes (aka migrations).
There are probably 6 or so Django database migration applications out there and at least one doesn't support Postgresql. I don't consider this a disadvantage though because you can use one of the others or do them manually (which is what I prefer atm).
Full text search might be better supported for MySQL. MySQL has built-in full text search supported from within Django but it's pretty useless (no word stemming, phrase searching, etc.). I've used django-sphinx as a better option for full text searching in MySQL.
Full text searching is built-in with Postgresql 8.3 (earlier versions need TSearch module). Here's a good instructional blog post: Full-text searching in Django with PostgreSQL and tsearch2
large database with several hundred
thousand entries,
This is not large database, it's very small one.
I'd choose PostgreSQL, because it has a lot more features. Most significant it this case: in PostgreSQL you can use Python as procedural language.
Go with whichever you're more familiar with. MySQL vs PostgreSQL is an endless war. Both of them are excellent database engines and both are being used by major sites. It really doesn't matter in practice.
All the answers bring interesting information to the table, but some are a little outdated, so here's my grain of salt.
As of 1.7, migrations are now an integral feature of Django. So they documented the main differences that Django developers might want to know beforehand.
Backend Support
Migrations are supported on all backends that Django ships with, as
well as any third-party backends if they have programmed in support
for schema alteration (done via the SchemaEditor class).
However, some databases are more capable than others when it comes to schema migrations; some of the caveats are covered below.
PostgreSQL
PostgreSQL is the most capable of all the databases here in terms of schema support.
MySQL
MySQL lacks support for transactions around schema alteration operations, meaning that if a migration fails to apply you will have to manually unpick the changes in order to try again (it’s impossible to roll back to an earlier point).
In addition, MySQL will fully rewrite tables for almost every schema operation and generally takes a time proportional to the number of rows in the table to add or remove columns. On slower hardware this can be worse than a minute per million rows - adding a few columns to a table with just a few million rows could lock your site up for over ten minutes.
Finally, MySQL has relatively small limits on name lengths for columns, tables and indexes, as well as a limit on the combined size of all columns an index covers. This means that indexes that are possible on other backends will fail to be created under MySQL.
SQLite
SQLite has very little built-in schema alteration support, and so
Django attempts to emulate it by:
Creating a new table with the new schema
Copying the data across
Dropping the old table
Renaming the new table to match the original name
This process generally works well, but it can be slow and occasionally
buggy. It is not recommended that you run and migrate SQLite in a
production environment unless you are very aware of the risks and its
limitations; the support Django ships with is designed to allow
developers to use SQLite on their local machines to develop less
complex Django projects without the need for a full database.
Even if Postgresql looks better, I find it has some performances issues with Django:
Postgresql is made to handle "long connections" (connection pooling, persistant connections, etc.)
MySQL is made to handle "short connections" (connect, do your queries, disconnect, has some performances issues with a lot of open connections)
The problem is that Django does not support connection pooling or persistant connection, it has to connect/disconnect to the database at each view call.
It will works with Postgresql, but connecting to a Postgresql cost a LOT more than connecting to a MySQL database (On Postgresql, each connection has it own process, it's a lot slower than just popping a new thread in MySQL).
Then you get some features like the Query Cache that can be really useful on some cases. (But you lost the superb text search of PostgreSQL)
When a migration fails in django-south, the developers encourage you not to use MySQL:
! The South developers regret this has happened, and would
! like to gently persuade you to consider a slightly
! easier-to-deal-with DBMS (one that supports DDL transactions)
Having gone down the road of MySQL because I was familiar with it (and struggling to find a proper installer and a quick test of the slow web "workbench" interface of postgreSQL put me off), at the end of the project, after a few months after deployment, while looking into back up options, I see you have to pay for MySQL's enterprise back up features. Gotcha right at the very end.
With MySql I had to write some ugly monster raw SQL queries in Django because no select distinct per group for retrieving the latest per group query. Also looking at postgreSQL's full-text search and wishing I had used postgresSQL.
I recommend PostgreSQL even if you are familiar with MySQL, but your mileage may vary.
UPDATE: DBeaver is a great equivalent of MySql Workbench gui tool but works with PostgreSQL very nicely (and many others as its a universal DB tool).
To add to previous answers :
"Full text search might be better supported for MySQL"
The FULLTEXT index in MySQL is a joke.
It only works with MyISAM tables, so you lose ACID, Transactions, Constraints, Relations, Durability, Concurrency, etc.
INSERT/UPDATE/DELETE to a largish TEXT column (like a forum post) will a rebuild a large part of the index. If it does not fit in myisam_key_buffer, then large IO will occur. I've seen a single forum post insertion trigger 100MB or more of IO ... meanwhile the posts table is exclusiely locked !
I did some benchmarking (3 years ago, may be stale...) which showed that on large datasets, basically postgres fulltext is 10-100x faster than mysql, and Xapian 10-100x faster than postgres (but not integrated).
Other reasons not mentioned are the extremely smart query optimizer, large choice of join types (merge, hash, etc), hash aggregation, gist indexes on arrays, spatial search, etc which can result in extremely fast plans on very complicated queries.
Will this application be hosted on your own servers or by a hosting company? Make sure that if you are using a hosting company, they support the database of choice.
There is a major licensing difference between the two db that will affect you if you ever intend to distribute code using the db. MySQL's client libraries are GPL and PostegreSQL's is under a BSD like license which might be easier to work with.