Which FOSS RDBMS to use for geospatial data?

Which FOSS RDBMS to use for geospatial data? - mysql

I'm developing an application using Google Maps API. The goal is to geocode certain locations and then allow users to search for these locations based on which ones are nearest to the user (e.g. "Thing x is 20 miles from you").
In MySQL, I can just store the geo-coordinates and use haversine formula to do the distance calculations. Someone has suggested I consider Postgres because it has "better support for geographical data."
So, the question is: what are the pros and cons of using MySQL or Postgres?

PostgreSQL has support for what you are talking about on board. But for a lot more functionality (and for what the "someone" you mention was probably thinking about) turn to PostGIS.
See the home page, documentation, or start at good old Wikipedia for an overview.
Edit after question in comment:
In particular, see the function support matrix to get an impression what PostGIS can do for you.
Computing the distance between two points is a standard feature. You can have that for a variety of data types. Which data type to use? See this question in the FAQ and further links there.

If it is just points, MySQL is fine. If you have more complex geometries, like delivery routes, or cell reception areas, or whatever, you want PostGIS, because it supports more sophisticated indexing of geometric data (r-trees). MyISAM is actually better than InnoDB for spatial data, BTW, because it also supports r-tree spatial indexes (but not as powerful queries as PostGIS.) If you just need points, though, InnoDB or MyISAM b-trees are adequate. If bounding boxes are enough (ie, you need everything within a rectangular plane around some point), then geohash-based indexes are ok. More background on all that here. It is well worth the trouble getting familiar with PostGIS and Postgres, they are both remarkably good projects and by far my preferred relational db, but just looking up points does not require them.

You could also check the MonogDB, it supports geospatial indexing letting you query for nearest objects very effectively ! Thats what the guys at Foursquare using to find nearest venues ...

Related

How to use our own data to create map layer dynamically?

We are creating a speed limit map application using different colors to highlight street with different speed limits (similar to ITO speed limit map: http://www.itoworld.com/map/124?lon=-79.37151&lat=43.74796&zoom=12).
The problem we have is that we are conducting our own research and have our own speed limit data instead of pulling the data from OpenStreetMap or Google Map like ITO map. We also need to create a data storage in order to dynamically update the map as we add more speed limit information in the future.
Is there anyway to create our own instance of OpenStreetMap and replace only the speed limit information with our own data? We don't have any vector data and we have no experience working with them.
Is there any suggestion of tools to use for creating highlighting layers based on the speed limit we have? Is OpenLayers a good option?
Any help is appreciated, thank you very much.
Update 2013/11/20
Thank you very much for your answers, now we have a much better understanding of your problem. This is a university design project so we basically have no budget. We are looking for:
1) A basic "base map" that include the basic tile information (openstreetmap seems a good choice since google map api doesn't provide free road information as long as we can find)
2) A geo data server that can host our own street speed limit data (looks like geoserver and mapserver are good choices), or a design simple database that can fulfill our need(doesn't know is it possible yet)
3) A plotting tool that can render our speed limit data as "group of lines" on the map since these data will be changed frequently (openlayers and leaflet are good candidates).
Is there anything else needed?

What you want to do is a trivial programming task once you have decided a few things:
These are probably the three biggest questions you need to answer. I added some commentary, but look at each of these questions beyond this post to find what works for you.
Who do you want to use for your map? Since you only have one type of data you will want to display that data on someone else's nice looking map. The big choices are Bing, Google, OpenLayers/OSM, and ESRI. Your choice will most likely be driven by the licensing of the above services and if you are willing to pay or not. A need to support mobile devices may also factor into your decision. Since the map is what your users will see, choose the best looking map you can afford.
How will you serve up your data? You have several options to serve your speed limit data. GeoServer and MapServer and ESRI are some popular mapping software packages. If you only displaying a few layers of data all mapping software will be overkill. The actual software to render your map data will most likely affect only your pocket book, so free is good here usually.
Tiles vs Lines
You will server your data as either a group of lines sent to the browser, or as pre-rendered tiles to be loaded on top of the map. If you data changes frequently you will want to serve it dynamically as line data (an array of points.) If your data does not change frequently, you should consider tiling your data. Tiling involves pre-rending of the entire map at all zoom levels. This allows the map to be loaded very fast and this how almost all base maps are rendered. The downside is that the tile generation can take a long amount of time and tiles can take a large amount of space.

This is a very broad question. There are many components to drawing your own speed limit map.
On the front-end, there is a web browser map interface. OpenLayers is good at that. There are plenty of other tools that can do this as well, such as Leaflet or even Google Maps API.
Next is something to provide the actual speed limit route data. This can be served as a vector layer or a raster layer. There are plenty of tools here, too. UMN Mapserver is free and reasonably good. ESRI makes a whole fleet of products in this area as well.
The speed limit route data also needs to be saved somehow. This can be done in files or in a database such as PostGIS. Again, lots of great options.
It is the role of the system architect to determine which technologies to employ to solve the problem.

How do I get vertice data from google/bing maps?

I'm trying to develop a application that uses informations from google/bing maps, but I need the vertice data to recreate roads and I can't use images since I can't get road names and height info.
I need vertices/nodes of streets (with latitude,longitude,altitude, street name ) and no visual data.
Thanks.

Open Street Map is definitely the way to go for this - extracting vertex information from Bing/Google is both technically difficult, and a breach of the Terms of Use. OSM data is better quality in many cases and, more to the point, free to use under a CC-BY-SA licence.
You'll also probably need a spatial database in which to store the information. I've written a couple of articles about loading OSM data into SQL Server which you might find helpful. e.g.:
http://alastaira.wordpress.com/2011/04/15/loading-open-street-map-data-in-sql-server-part-ii-ways/

Triple Stores vs Relational Databases [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I was wondering what are the advantages of using Triple Stores over a relational database?

The viewpoint of the CTO of a company that extensively uses RDF Triplestores commercially:
Schema flexibility - it's possible to do the equivalent of a schema change to an RDF store live, and without any downtime, or redesign - it's not a free lunch, you need to be careful with how your software works, but it's a pretty easy thing to do.
More modern - RDF stores are typically queried over HTTP it's very easy to fit them into Service Architectures without hacky bridging solutions, or performance penalties. Also they handle internationalised content better than typical SQL databases - e.g. you can have multiple values in different languages.
Standardisation - the level of standardisation of implementations using RDF and SPARQL is much higher than SQL. It's possible to swap out one triplestore for another, though you have to be careful you're not stepping outside the standards. Moving data between stores is easy, as they all speak the same language.
Expressivity - it's much easier to model complex data in RDF than in SQL, and the query language makes it easier to do things like LEFT JOINs (called OPTIONAL in SPARQL). Conversely though, if your data is very tabular, then SQL is much easier.
Provenance - SPARQL lets you track where each piece of information came from, and you can store metadata about it, letting you easily do sophisticated queries, only taking into account data from certain sources, or with a certain trust level, on from some date range etc.
There are downsides though. SQL databases are generally much more mature, and have more features than typical RDF databases. Things like transactions are often much more crude, or non existent. Also, the cost per unit information stored in RDF v's SQL is noticeably higher. It's hard to generalise, but it can be significant if you have a lot of data - though at least in our case it's an overall benefit financially given the flexibility and power.

Both commenters are correct, especially since Semantic Web is not a database, it's a bit more general than that.
But I guess you might mean triple store, rather than Semantic Web in general, as triple store v. relational database is a somewhat more meaningful comparison. I'll preface the rest of my answer by noting that I'm not an expert in relational database systems, but I have a little bit of knowledge about triple stores.
Triple (or quad) stores are basically databases for data on the semantic web, particularly RDF. That's kind of where the similarity between triples stores & relational databases end. Both store data, both have query languages, both can be used to build applications on top of; so I guess if you squint your eyes, they're pretty similar. But the type of data each stores is quite different, so the two technologies optimize for different use cases and data structures, so they're not really interchangeable.
A lot of people have done work in overlaying a triples view of the world on top of a relational database, and that can work, and also will be slower than a system dedicated for storing and retrieving triples. Part of the problems is that SPARQL, the standard query language used by triple stores, can require a lot of self joins, something relational databases are not optimized for. If you look at benchmarks, such as SP2B, you can see that Oracle, which just overlays SPARQL support on its relational system, runs in the middle or at the back of the pack when compared with systems that more natively support RDF.
Of course, the RDF systems would probably get crushed by Oracle if they were doing SQL queries over relational data. But that's kind of the point, you pick the tool that's well suited for the application you want to build.
So if you're thinking about building a semantic web application, or just trying to get some familiarity in the area, I'd recommend ultimately going with a dedicated triple store.
I won't delve into reasoning and how that plays into query answering in triple stores, as that's yet another discussion, but it's another important distinction between relational systems and triple stores that do reasoning.

Some triplestores (Virtuoso, Jena SDB) are based on relational databases and simply provide an RDF / SPARQL interface. So to rephrase the question slighty, are triplestores built from the ground up as a triplestore more performant than those that aren't - #steve-harris definitely knows the answer to that ;) but I wager a yes.
Secondly, what features do triplestores have that RDBMS don't. The simple answer is support for SPARQL, RDF, OWL etc. (i.e the Semantic Web Technology stack) and to make it a fair fight, its better to define the value of SPARQL based on SPARQL 1.1 (it has considerably more features than 1.0). This provides support for federation (so so cool), property path expressions and entailment regimes along with an standards set of update protocols, graph management protocols (that SPARQL 1.0 didn't have and sorely lacked). Also #steve-harris points out that transactions are not part of the standard (can of worms) although many vendors provide non-standardised mechanisms for transactions (Virtuoso supports JDBC and Hibernate compliant connection pooling and management along with all the transactional features of Hibernate)
The big drawback in my mind is that not many triplestores support all of SPARQL 1.1 (since it is still not in recommendation) and this is where the real benefits lie.
Having said that, I am and always have been an advocate of substituting RDBMS with triplestores and platforms I deliver run entirely off triplestores (Volkswagen in my last role was an example of this), deprecating the need for RDBMS. An additional advantage is that Object to RDF mapping is more flexible and provides more options and flexibility than traditional ORM (also known as putting a square peg in a round hole).

Also you can still use a database but use RDF as a data exchange format which is very flexible.

GIS sample data for testing performance

I need to test performance of queries based on spatial data. I decided to use sql server and geometry datatype.
Now I need to have sample data (for example maps, cities etc). Do You know any resources that I can use to then load it into my database ?
Thanks for any help

The U.S. Census Bureau makes all of their shape files available to the public. See here:
http://www.census.gov/geo/www/tiger/
Once you have picked whatever shape files you want, you can import them into SQL Server using the excellent Shape2SQL tool found here:
http://www.sharpgis.net/page/Shape2SQL.aspx

You will find many good pointers, including Ordnance survey, CloudMade and Natural Earth, among answers to to this question:
Are there any free administrative boundaries available as shapefiles?

What is required in order to implement a geo-spatial/proximity search?

I want to better understand what is required in order to implement a geo-spatial (aka proximity) search. I'd appreciate clarification on the following:
Beyond the latitude & longitude for
corresponding zip codes, what if
anything, is required?
Can anyone recommend any resources (books,
websites, etc.) for understanding
the formulas that can be used to
calculate proximity such as:
Haversine, Vincenty, Spherical?
How easy and effective are Mysql's tools for
implementing proximity searches?
Does Google Maps have an API for
proximity searches? For example if I
provide, a zip code can it return
zip codes within a set radius? I
searched the Google Maps website but
found nothing of the sort.
Thanks.

You just need to do a lookup on the zip code against a lat/lon. You can get this data at some of these sites.
This is a good reference for Haversine / great circle calculations.
MySQL apparently has a geometry type that can be used for some of this. There are also other Spatial Database that might be more useful.
I've never heard that it has that capability, but that doesn't mean that it can't. It might be better to look to one of the spatial databases listed at the link above.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008