I have been working on a web project that stores locations for users. The project uses currently MySql 5.5 and JPA 2 for mapping the relation database, together with EJB 3.1 as the middle tier. I store longitude and latitude data in Decimal data types in MySql.
I want to expand the project so the user can search for Points (GPS coordinates) near the location he/she marks out (through using Google Map API 3) on the map.
I need some hints and suggestions before starting transforming or changing components in my project. If possible with tutorial and how to achieve the changes and what tools (library, dev tools, etc.) to use. Here are my questions.
Can I use the Spatial extension in MySql (using data types like Point) and map this to entities that are supported by JPA 2 (like DataNucleus library, if they are supported by JPA 2). That are light weight and can persist them self through the persistent layer.
Is it better to move to PostGIS and Postgres (which seem to have better spatial support) with all the pain and effort needed to change database and using hibernate instead. Netbeans has Hibernate support but when trying with the JDBC for PostGis I stumble upon problems. Need a good tutorial to follow to get started if I go down this road.
Using the infrastructure that is already in-place and compute the distance from lactation A and B. Similar to the way done in this article by Jan Philip Matuschek
Excuse me for clustering the questions.
Regards Chris
MySQL's Spacial extension is usually OK if you only have point locations, but it starts to get overwhelmed if you want to find points in polygons (i.e., query by region), or use fancy indices and search algorithms. There is a really good cross comparison to help show the differences between the different spatial DBs.
I'd see that there are many more benefits to migrating to PostgreSQL and PostGIS. There are also many more GIS applications that work naively with PostGIS, such as QGIS, Geoserver, etc. The most recent version of PostGIS has a KNN nearest neighbor search operator to quickly find, for example, the nearest 25 points out of several million rows to a particular point on your map (see also here).
See the manual for JDBC and PostGIS. As for JPA, see PostGIS and JPA 2.0
As for distance calculations, see ST_Distance and ST_Distance_Spheroid.
Related
There are multiple Tutorials/Questions over the Internet/Youtube/StackOverflow for finding nearyby businesses, given a location, for example (Question on StackOverflow) :
Returning nearby locations in Django
But one thing common in these all is that they all prefers PostgreSQL (instead of MySQL) for Django's Geodjango library
I am building a project as:
Here a user can register as a customer as well as a business (customer's and business's name/address etc (all) fields will be separate, even if its the same user)
This is not how database is, only for rough idea or project
Both customer and business will have their locations stored
Customers can find nearby businesses around him
I was wondering what are the specific advantages of using PostgreSQL over MySQL in context to computing and fetching the location related fields.
(MySQL is a well tested database for years and most of my data is relational, so I was planning to use MySQL or Microsoft SQL Server)
Would there be any processing disadvantages in context to algorithms used to compute nearby businesses if I choose to go with MySQL, how would it make my system slow?
But one thing common in these all is that they all prefers PostgreSQL (instead of MySQL) for Django's Geodjango library
The reason why they suggest using Postgres is that it has better support for spatial data. It's not that MySQL doesn't support spatial data. However, there is a long list of features which Postgres supports and MySQL doesn't. You can look at this page for details. Almost every time MySQL is mentioned on that page, it is to describe a feature that it does not support, but that Postgres does.
(MySQL is a well tested database for years and most of my data is relational, so I was planning to use MySQL or Microsoft SQL Server)
Note that foriegn key constraints are not compatible with MyISAM, which is the only MySQL database engine which supports spatial indexes. So if you pick MySQL, you need to choose between referential integrity and fast spatial lookups.
If you use Postgres, you can have both referential integrity and fast spatial lookups. Postgres is also a quite mature and widely used relational database these days.
Would there be any processing disadvantages in context to algorithms used to compute nearby businesses if I choose to go with MySQL, how would it make my system slow?
It really depends on how many businesses you're searching for. If you pick an engine that does not support spatial indexes, MySQL is forced to do a full table scan, which takes O(N) time. On the other hand, it can do bounding box comparisons to eliminate many geometries quite quickly. I have seen acceptable interactive performance for 100k points, with performance dropping off after that. In contrast, Postgres with a spatial index is fast for any number of points.
I would like to know what is the best way to store information about the GPS position of the device in real time using NodeJS and some database server.
Example:
Currently there are 5,000 devices connected to the server, each of these devices is sending information about their GPS position with a maximum timeout of 1 seconds.
Approximately per minute the NodeJS server will be receiving 300,000 PUT requests.
Too much request, I know it.
Another important factor to take into consideration is that the database server is technically another server or instance.
By the way, I plan to use Amazon AWS or Microsoft Azure as a cloud server provider.
So my question is, what is the most efficient way to store this information on realtime?
Among my options are MySQL, MongoDB, PostgreSQL, etc.
They could explain why it is better to use the option you recommend.
Thank you!
i would say - MongoDB.
why i would say that -
MongoDB offers a number of indexes and query mechanisms to handle geospatial information. You can easily Store your location data as GeoJSON objects with this coordinate-axis order: longitude, latitude.
MongoDB also recently included support for additional GeoJSON types: MultiPoint, MultiLineString, MultiPolygon, GeometryCollection.
MongoDB also offers a vast set of query operators that allows you to easily query over the geospatial data such as:
$geoWithin
$geoIntersects
$near
a typical example would be
{
location: {
type: "Point",
coordinates: [40.7829, 73.9654]
},
name: "Central Park, New York City"
}
You can go over a wide range of examples at https://docs.mongodb.com/v3.2/tutorial/geospatial-tutorial/
Lastly, MongoDB let you build a 2D indexes on the geospatial data. looking at the usage you mentioned. it should come handy.
As already say MongoDB has exactly all the needed in one place.
For the Index side, you can also consider "Partial Index" in order to exclude document that you don't need anymore (take in consideration all the Document less then 12 months only, for example): https://docs.mongodb.com/v3.2/core/index-partial/
For the Cloud side, i strong suggest you MongoDB ATLAS.
https://www.mongodb.com/cloud/atlas/pricing
I'm about to start a GIS project, and I want to know if I really need PostGIS or if MySQL's Spatial Extensions are sufficient?
PostGIS is a complete spatialdb and has been for a while. MySQL continues to only implement the Minimum Bounding Rectangle functions
http://dev.mysql.com/doc/refman/5.5/en/functions-for-testing-spatial-relations-between-geometric-objects.html
If you want to do spatial work on FOSS DB your only real choices at this point are
1) PostGIS
2) Spatialite
There are community editions of the other DBs that have spatial extensions as well that you can use.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
EDIT: I have been using Postgres with PostGIS for a few months now, and I am satisfied.
I need to analyze a few million geocoded records, each of which will have latitude and longitude. These records include data of at least three different types, and I will be trying to see if each set influences the other.
What database is best for the underlying data store for all this data? Here's my desires:
I'm familiar with the DBMS. I'm weakest with PostgreSQL, but I am willing to learn if everything else checks out.
It does well with GIS queries. Google searches suggest that PostgreSQL + PostGIS may be the strongest? At least a lot of products seem to use it. MySql's Spatial Extensions seem comparatively minimal?
Low cost. Despite the 10GB DB limit in SQL Server Express 2008 R2, I'm not sure I want to live with this and other limitations of the free version.
Not antagonistic with Microsoft .NET Framework. Thanks to Connector/Net 6.3.4, MySql works well C# and .NET Framework 4 programs. It fully supports .NET 4's Entity Framework. I cannot find any noncommercial PostgreSQL equivalent, although I'm not opposed to paying $180 for Devart's dotConnect for PostgreSQL Professional Edition.
Compatible with R. It appears all 3 of these can talk with R using ODBC, so may not be an issue.
I've already done some development using MySql, but I can change if necessary.
I have worked with all three databases and done migrations between them, so hopefully I can still add something to an old post. Ten years ago I was tasked with putting a largish -- 450 million spatial objects -- dataset from GML to a spatial database. I decided to try out MySQL and Postgis, at the time there was no spatial in SQL Server and we had a small startup atmosphere, so MySQL seemed a good fit. I subsequently was involved in MySQL, I attended/spoke at a couple of conferences and was heavily involved in the beta testing of the more GIS-compliant functions in MySQL that was finally released with version 5.5. I have subsequently been involved with migrating our spatial data to Postgis and our corporate data (with spatial elements) to SQL Server. These are my findings.
MySQL
1). Stability issues. Over the course of 5 years, we had several database corruptions issues, which could only be fixed by running myismachk on the index file, a process than can take well over 24 hours on a 450 million row table.
2). Until recently only MyISAM tables supported the spatial data type. This means if you want transaction support you are out of luck. InnoDB table type does now support spatial types, but not indexes on them, which given the typical sizes of spatial data sets, isn't terribly useful. See http://dev.mysql.com/doc/refman/5.0/en/innodb-restrictions.html My experience from going to conferences was that spatial was very much an afterthought -- we've implemented replication, partitioning, etc, but it doesn't work with spatial.
EDIT: In the upcoming 5.7.5 release InnoDB will finally support indexes on spatial columns, meaning that ACID, foreign keys and spatial indexes will finally be available in the same engine.
3). The spatial functionality is extremely limited compared to both Postgis and SQL Server spatial. There is still no ST_Union function that acts on an entire geometry field, one of the queries I run most often, ie, you can't write:
select attribute, ST_Union(geom) from some_table group by some_attribute
which is very useful in a GIS context. Select ST_Union(geom1, const_geom) from some_table, ie, one of the geometries is a hard-coded constant geometry is a bit limiting in comparison.
4). No support for rasters. Being able to do combined vector-raster analysis within a db is very useful GIS functionality.
5). No support for conversion from one spatial reference system to another.
6). Since acquisistion by Oracle, spatial has really been put on hold.
Overall, to be fair to MySQL it supported our website, WMS and general spatial processing for several years, and was easy to set up. On the downside, data corruption was an issue, and by being forced to use MyISAM tables you are giving up a lot of the benefits of an RDBMS.
Postgis
Given the issues we had with MySQL, we ultimately converted to Postgis. The key points of this experience have been.
1). Extreme stability. No data corruption in 5 years and we now have around 25 Postgres/GIS boxes on centos virtual machines, under varying degrees of load.
2). Rapid pace of development -- raster, topology, 3D support being recent examples of this.
3). Very active community. The Postgis irc channel and mailing list are excellent resources. The Postgis reference manual is also excellent. http://postgis.net/docs/manual-2.0/
4). Plays very well with other applications, under the OSGeo umbrella, such as GeoServer and GDAL.
5). Stored procedures can be written in many languages, apart from the default plpgsql, such as Python or R.
5). Postgres is a very standards compliant, fully featured RDBMS, which aims to stay close to the ANSI standards.
6). Support for window functions and recursive queries -- not in MySQL, but in SQL Server. This has made writing more complex spatial queries cleaner.
SQL Server.
I have only used SQL Server 2008 spatial functionality, and many of the annoyances of that release -- lack of support for conversions from one CRS to another, the need to add your own parameters to spatial indexes -- have now been resolved.
1). As spatial objects in SQL Server are basically CLR objects, the syntax feels backwards. Instead of ST_Area(geom) you write geom.STArea() and this becomes even more obvious when you chain functions together. The dropping of the underscore in function names is merely a minor annoyance.
2). I have had a number of invalid polygons that have been accepted by SQL Server, and the lack of a ST_MakeValid function can make this a bit painful.
3). Windows only. In general, Microsoft products (like ESRI ones) are designed to work very well with each other, but don't always have standard's compliance and interoperability as primary objectives. If you are running a windows only shop, this is not an issue.
UPDATE: having played a bit with SQL Server 2012, I can say that it has been improved significantly. There is now a good geometry validation function, there is good support for the Geography data type, including a FULL GLOBE object, which allows representing objects that occupy more than one hemisphere and support for Compound Curves and Circular Strings which is useful for accurate and compact representations of arcs (and circles) among other things. Transforming coordinates from one CRS to another still needs to be done in 3rd party libraries, though this is not a show stopper in most applications.
I haven't used SQL Server with large enough datasets to compare one on one with Postgis/MySQL, but from what I have seen the functions behave correctly, and while not quite as fully featured as Postgis, it is a huge improvement on MySQL's offerings.
Sorry for such a long answer, I hope some of the pain and joy I have suffered over the years might be of help to someone.
If you are interested in a thorough comparison, I recommend "Cross Compare SQL Server 2008 Spatial, PostgreSQL/PostGIS 1.3-1.4, MySQL 5-6" and/or "Compare SQL Server 2008 R2, Oracle 11G R2, PostgreSQL/PostGIS 1.5 Spatial Features" by Boston GIS.
Considering your points:
I'm familiar with the DBMS: setting up a PostGIS database on Windows is easy, using PgAdmin3 management is straight-forward too
It does well with GIS queries: PostGIS is definitely strongest of the three, only Oracle Spatial would be comparable but is disqualified if you consider its costs
Low cost: +1 for PostGIS for sure
Not antagonistic with Microsoft .NET Framework: You should at least be able to connect via ODBC (see Postgres wiki)
Compatible with R: shouldn't be a problem with any of the three
PostGis definitely. Here's why.
Postgres is far superior to MySQL in performance. Server is more fault tolerant, has out of the box tools for load-balancing, caching and optimization.
PostGIS is becoming a standard in GIS apps.
It's free.
Just an note that MySQL has finally added in proper GIS logic.
http://dev.mysql.com/doc/refman/5.6/en/functions-for-testing-spatial-relations-between-geometric-objects.html
But I can't comment on cost or performance at this stage
PostGIS is best because it is becoming a standard in GIS applications these days and PostGIS is free. It is far superior to MySQL in performance
Has any of you had any experience with using NoSQL (non-relational) databases to store spatial data? Are there any potential benefits (speed, space, ...) of using such databases to hold data for, say, a desktop application (compared to using SpatiaLite or PostGIS)?
I've seen posts about using MongoDB for spatial data, but I'm interested in some performance comparison.
graphs databases like Neo4j are a very good fit, especially as you can add different indexing schemes dynamically as you go. Typical stuff you can do on your base data is of course 1D indexing (e.g. Timline or B-Trees) or funkier stuff like Hilbert Curves etc, see Nick's blog. Also, for some live demonstration, look at the AWE open source GIS desktop tool here, the underlying indexed graph being visible around time 07:00 .
Currently, MongoDB uses geohashing with B-trees which will be slower than the R-trees of PostGIS (I can't give exact numbers, I'm afraid, but there is plenty of theoretical literature on the differences). However, in these slides, http://www.slideshare.net/nknize/rtree-spatial-indexing-with-mongodb-mongodc the author talks about adding R-trees to MongoDB and sharding on a geo key. You talk about desktop use, so geosharding may not be of interest, as sharding's benefits will be felt more on massive datasets.
Ultimately, it probably comes down more to what you want to do with your spatial data. Postgis has vastly more functions and support for topology, rasters, 3D, conversions between coordinate systems, so if this is what you are looking for, PostGIS would still be the best option. If you are interested in storing billions/trillions of spatial objects and just running basic find all points near/inside this point based on some criteria, then MongoDB is likely a very good choice.
Couchdb also has a simple spatial extension
http://vmx.cx/cgi-bin/blog/index.cgi/category/CouchDB
I've been storing spatial data with ZODB. There's some inherent performance advantage in accessing local file data (spatialite) or unix socket (PostGIS) compared to TCP or HTTP requests (CouchDB etc), surely, but having an spatial index makes the biggest difference. I'm using the same R-trees mentioned in the MongoDB article, but there are plenty of good options. The JTS topology suite has various spatial indexes for Java.
Cassandra is also an option for spatial data:
http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php
Tarantool supports spatial two-dimensional index (RTREE) with nearest neighbor search, overlaps, contains, and other spatial operators. Tarantool maintains the entire data set in RAM, making it the only OSS in-memory database with spatial index support.
https://github.com/tarantool/tarantool/wiki/R-tree-index-quick-start-and-usage
MarkLogic(Enterprise NoSQL) provides spatial functionality. This NoSQL product provides GIS applications the ability to conflate multiple objects into one entity. This provides support for managing relationships across structured and unstructured content, provenance and pedigree information about the data, historic and timeline information, etc. in a single entity.